[jira] [Work logged] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15815?focusedWorklogId=581562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581562
 ]

ASF GitHub Bot logged work on HDFS-15815:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 05:49
Start Date: 13/Apr/21 05:49
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2882:
URL: https://github.com/apache/hadoop/pull/2882#issuecomment-818456966


   Sure @jojochuang, Thanx 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581562)
Time Spent: 50m  (was: 40m)

>  if required storageType are unavailable, log the failed reason during 
> choosing Datanode
> 
>
> Key: HDFS-15815
> URL: https://issues.apache.org/jira/browse/HDFS-15815
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, 
> HDFS-15815.003.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> For better debug,  if required storageType are unavailable, log the failed 
> reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15912) Allow ProtobufRpcEngine to be extensible

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15912:
--
Labels: pull-request-available  (was: )

> Allow ProtobufRpcEngine to be extensible
> 
>
> Key: HDFS-15912
> URL: https://issues.apache.org/jira/browse/HDFS-15912
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations 
> to extend some of its inner classes (e.g. Invoker and 
> Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such 
> that overriding them would result in a lot of code duplication (e.g. 
> Invoker#invoke and Server.ProtoBufRpcInvoker#call).
> When implementing a new RpcEngine, it would be helpful to reuse most of the 
> code already in ProtobufRpcEngine. This would allow new fields to be added to 
> the RPC header or message with minimal code changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15912) Allow ProtobufRpcEngine to be extensible

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15912?focusedWorklogId=581560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581560
 ]

ASF GitHub Bot logged work on HDFS-15912:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 05:41
Start Date: 13/Apr/21 05:41
Worklog Time Spent: 10m 
  Work Description: hchaverri opened a new pull request #2901:
URL: https://github.com/apache/hadoop/pull/2901


   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581560)
Remaining Estimate: 0h
Time Spent: 10m

> Allow ProtobufRpcEngine to be extensible
> 
>
> Key: HDFS-15912
> URL: https://issues.apache.org/jira/browse/HDFS-15912
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations 
> to extend some of its inner classes (e.g. Invoker and 
> Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such 
> that overriding them would result in a lot of code duplication (e.g. 
> Invoker#invoke and Server.ProtoBufRpcInvoker#call).
> When implementing a new RpcEngine, it would be helpful to reuse most of the 
> code already in ProtobufRpcEngine. This would allow new fields to be added to 
> the RPC header or message with minimal code changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319891#comment-17319891
 ] 

Fengnan Li commented on HDFS-15423:
---

[~elgoiri] Sure, I will create a new one.

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15972) Fedbalance only copies data partially when there's existing opened file

2021-04-12 Thread Felix N (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319870#comment-17319870
 ] 

Felix N commented on HDFS-15972:


Hi [~LiJinglun], is this the expected behavior? During heavy write period, this 
might lead to data loss.

> Fedbalance only copies data partially when there's existing opened file
> ---
>
> Key: HDFS-15972
> URL: https://issues.apache.org/jira/browse/HDFS-15972
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Felix N
>Priority: Major
>
> If there are opened files when fedbalance is run and data is being written to 
> these files, fedbalance might skip the newly written data.
> Steps to recreate the issue:
>  # Create a dummy file /test/file with some data: {{echo "start" | hdfs dfs 
> -appendToFile /test/file}}
>  # Start writing to the file: {{hdfs dfs -appendToFile /test/file}} but do 
> not stop writing
>  # Run fedbalance: {{hadoop fedbalance submit hdfs://ns1/test 
> hdfs://ns2/test}}
>  # Write something to the file while fedbalance is running, "end" for 
> example, then stop writing
>  # After fedbalance is done, {{hdfs://ns2/test/file}} should only contain 
> "start" while {{hdfs://ns1/user/hadoop/.Trash/Current/test/file}} contains 
> "start\nend"
> Fedbalance is run with default configs and arguments so no diff should happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-15972) Fedbalance only copies data partially when there's existing opened file

2021-04-12 Thread Felix N (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix N moved HADOOP-17634 to HDFS-15972:
-

Key: HDFS-15972  (was: HADOOP-17634)
Project: Hadoop HDFS  (was: Hadoop Common)

> Fedbalance only copies data partially when there's existing opened file
> ---
>
> Key: HDFS-15972
> URL: https://issues.apache.org/jira/browse/HDFS-15972
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Felix N
>Priority: Major
>
> If there are opened files when fedbalance is run and data is being written to 
> these files, fedbalance might skip the newly written data.
> Steps to recreate the issue:
>  # Create a dummy file /test/file with some data: {{echo "start" | hdfs dfs 
> -appendToFile /test/file}}
>  # Start writing to the file: {{hdfs dfs -appendToFile /test/file}} but do 
> not stop writing
>  # Run fedbalance: {{hadoop fedbalance submit hdfs://ns1/test 
> hdfs://ns2/test}}
>  # Write something to the file while fedbalance is running, "end" for 
> example, then stop writing
>  # After fedbalance is done, {{hdfs://ns2/test/file}} should only contain 
> "start" while {{hdfs://ns1/user/hadoop/.Trash/Current/test/file}} contains 
> "start\nend"
> Fedbalance is run with default configs and arguments so no diff should happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581523
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:51
Start Date: 13/Apr/21 03:51
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#discussion_r612107351



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java
##
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.net.Node;
+import org.apache.hadoop.net.NodeBase;
+import org.apache.hadoop.util.StringUtils;
+
+import javax.servlet.ServletContext;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.TreeSet;
+
+/**
+ * A servlet to print out the network topology.
+ */
+@InterfaceAudience.Private
+public class NetworkTopologyServlet extends DfsServlet {
+
+  public static final String PATH_SPEC = "/topology";
+
+  @Override
+  public void doGet(HttpServletRequest request, HttpServletResponse response)

Review comment:
   I'm sorry, I don't quite understand what you mean. Could you please give 
me some specific suggestions, thank you very much.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581523)
Time Spent: 1h  (was: 50m)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581522
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:47
Start Date: 13/Apr/21 03:47
Worklog Time Spent: 10m 
  Work Description: tomscut commented on a change in pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#discussion_r612106334



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java
##
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.net.Node;
+import org.apache.hadoop.net.NodeBase;
+import org.apache.hadoop.util.StringUtils;
+
+import javax.servlet.ServletContext;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.TreeSet;
+
+/**
+ * A servlet to print out the network topology.
+ */
+@InterfaceAudience.Private
+public class NetworkTopologyServlet extends DfsServlet {
+
+  public static final String PATH_SPEC = "/topology";
+
+  @Override
+  public void doGet(HttpServletRequest request, HttpServletResponse response)
+  throws IOException {
+final ServletContext context = getServletContext();
+NameNode nn = NameNodeHttpServer.getNameNodeFromContext(context);
+BlockManager bm = nn.getNamesystem().getBlockManager();
+List leaves = bm.getDatanodeManager().getNetworkTopology()
+.getLeaves(NodeBase.ROOT);
+
+response.setContentType("text/plain; charset=UTF-8");
+try (PrintStream out = new PrintStream(
+response.getOutputStream(), false, "UTF-8")) {
+  printTopology(out, leaves);
+} catch (Throwable t) {
+  String errMsg = "Print network topology failed. "
+  + StringUtils.stringifyException(t);
+  response.sendError(HttpServletResponse.SC_GONE, errMsg);
+  throw new IOException(errMsg);
+} finally {
+  response.getOutputStream().close();
+}
+  }
+
+  /**
+   * Display each rack and the nodes assigned to that rack, as determined
+   * by the NameNode, in a hierarchical manner.  The nodes and racks are
+   * sorted alphabetically.
+   *
+   * @param stream print stream
+   * @param leaves leaves nodes under base scope
+   */
+  public void printTopology(PrintStream stream, List leaves) {
+if (leaves.size() == 0) {
+  stream.print("No DataNodes");
+  return;
+}
+
+// Build a map of rack -> nodes from the datanode report
+HashMap> tree = new HashMap>();
+for(Node dni : leaves) {
+  String location = dni.getNetworkLocation();
+  String name = dni.getName();
+
+  if(!tree.containsKey(location)) {

Review comment:
   Thanks @goiri for your careful review, I will fix these problems quickly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581522)
Time Spent: 50m  (was: 40m)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> 

[jira] [Work logged] (HDFS-15621) Datanode DirectoryScanner uses excessive memory

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15621?focusedWorklogId=581520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581520
 ]

ASF GitHub Bot logged work on HDFS-15621:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:45
Start Date: 13/Apr/21 03:45
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #2849:
URL: https://github.com/apache/hadoop/pull/2849#issuecomment-818409032


   The spotbugs warning looks like a false positive to me.
   `Redundant nullcheck of file, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File,
 File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at 
FsVolumeImpl.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File,
 File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at 
FsVolumeImpl.java:[line 1477]`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581520)
Time Spent: 40m  (was: 0.5h)

> Datanode DirectoryScanner uses excessive memory
> ---
>
> Key: HDFS-15621
> URL: https://issues.apache.org/jira/browse/HDFS-15621
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot 
> 2020-10-09 at 15.20.56.png
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes 
> with a lot of blocks, this can mean a lot of heap.
> We recently captured a heapdump of a DN with about 22M blocks and found only 
> about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken 
> by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to 
> strings.
> Checking the strings in question, we can see two strings per scanInfo, 
> looking like:
> {code}
> /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785
> _106716708.meta
> {code}
> I will update a screen shot from MAT showing this.
> For the first string especially, the part 
> "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will 
> be the same for every block in the block pool as the scanner is only 
> concerned about finalized blocks.
> We can probably also store just the subdir indexes "28" and "27" rather than 
> "subdir28/subdir17" and then construct the path when it is requested via the 
> getter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319861#comment-17319861
 ] 

Íñigo Goiri commented on HDFS-15423:


I reverted the merge.
[~fengnanli] do you mind creating a new PR fixing the compilation issue?
I'm curious on why Yetus didn't catch this.

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581518
 ]

ASF GitHub Bot logged work on HDFS-15423:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:44
Start Date: 13/Apr/21 03:44
Worklog Time Spent: 10m 
  Work Description: goiri opened a new pull request #2900:
URL: https://github.com/apache/hadoop/pull/2900


   Reverts apache/hadoop#2605


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581518)
Time Spent: 6h 20m  (was: 6h 10m)

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581519
 ]

ASF GitHub Bot logged work on HDFS-15423:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:44
Start Date: 13/Apr/21 03:44
Worklog Time Spent: 10m 
  Work Description: goiri merged pull request #2900:
URL: https://github.com/apache/hadoop/pull/2900


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581519)
Time Spent: 6.5h  (was: 6h 20m)

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on the web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581517
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:43
Start Date: 13/Apr/21 03:43
Worklog Time Spent: 10m 
  Work Description: goiri commented on a change in pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#discussion_r611908007



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java
##
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import org.apache.hadoop.classification.InterfaceAudience;
+import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager;
+import org.apache.hadoop.net.NetUtils;
+import org.apache.hadoop.net.Node;
+import org.apache.hadoop.net.NodeBase;
+import org.apache.hadoop.util.StringUtils;
+
+import javax.servlet.ServletContext;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.TreeSet;
+
+/**
+ * A servlet to print out the network topology.
+ */
+@InterfaceAudience.Private
+public class NetworkTopologyServlet extends DfsServlet {
+
+  public static final String PATH_SPEC = "/topology";
+
+  @Override
+  public void doGet(HttpServletRequest request, HttpServletResponse response)
+  throws IOException {
+final ServletContext context = getServletContext();
+NameNode nn = NameNodeHttpServer.getNameNodeFromContext(context);
+BlockManager bm = nn.getNamesystem().getBlockManager();
+List leaves = bm.getDatanodeManager().getNetworkTopology()
+.getLeaves(NodeBase.ROOT);
+
+response.setContentType("text/plain; charset=UTF-8");
+try (PrintStream out = new PrintStream(
+response.getOutputStream(), false, "UTF-8")) {
+  printTopology(out, leaves);
+} catch (Throwable t) {
+  String errMsg = "Print network topology failed. "
+  + StringUtils.stringifyException(t);
+  response.sendError(HttpServletResponse.SC_GONE, errMsg);
+  throw new IOException(errMsg);
+} finally {
+  response.getOutputStream().close();
+}
+  }
+
+  /**
+   * Display each rack and the nodes assigned to that rack, as determined
+   * by the NameNode, in a hierarchical manner.  The nodes and racks are
+   * sorted alphabetically.
+   *
+   * @param stream print stream
+   * @param leaves leaves nodes under base scope
+   */
+  public void printTopology(PrintStream stream, List leaves) {
+if (leaves.size() == 0) {
+  stream.print("No DataNodes");
+  return;
+}
+
+// Build a map of rack -> nodes from the datanode report
+HashMap> tree = new HashMap>();

Review comment:
   Can we do:
   Map> tree = new HashMap>();

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java
##
@@ -0,0 +1,115 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.namenode;
+
+import 

[jira] [Work logged] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15815?focusedWorklogId=581512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581512
 ]

ASF GitHub Bot logged work on HDFS-15815:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:36
Start Date: 13/Apr/21 03:36
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #2882:
URL: https://github.com/apache/hadoop/pull/2882#issuecomment-818406777


   @ayushtkn fyi will merge later if no objections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581512)
Time Spent: 40m  (was: 0.5h)

>  if required storageType are unavailable, log the failed reason during 
> choosing Datanode
> 
>
> Key: HDFS-15815
> URL: https://issues.apache.org/jira/browse/HDFS-15815
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, 
> HDFS-15815.003.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For better debug,  if required storageType are unavailable, log the failed 
> reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.1.5

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581505
 ]

ASF GitHub Bot logged work on HDFS-15714:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:11
Start Date: 13/Apr/21 03:11
Worklog Time Spent: 10m 
  Work Description: PHILO-HE commented on pull request #2655:
URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818398537


   1) Yes, LevelDB based AliasMap is recommended to user. And text based 
AliasMap is just for the purpose of unit test. In this patch, we made few code 
changes for AliasMap. You may note that it was initially introduced by the 
community few years ago.
   
   2) For namenode HA, we had not tested this feature on that. I think there 
are two main considerations. Firstly, mount operation should be recovered in 
new NN. Thus, the mounted remote storages are "visible" to new active NN. Since 
we log mount info in edit log for each mount request, this may be not a 
problem. Secondly, key info currently kept in memory should be available to 
other NNs, e.g., key tracking info in syncing data to remote storage to 
guarantee data consistency even though active NN is shifted.
   
   Frankly speaking, provided storage is still an experimental feature. So 
there may still exist a large gap for productization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581505)
Time Spent: 2.5h  (was: 2h 20m)

> HDFS Provided Storage Read/Write Mount Support On-the-fly
> -
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15714-01.patch, 
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. 
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through 
> configuring external storage with PROVIDED tag for DataNode, user can enable 
> application to access data stored externally from HDFS side. However, there 
> are two issues need to be addressed. Firstly, mounting external storage 
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it 
> supported to flexibly combine HDFS with an external storage at runtime. 
> Secondly, PS write is not supported by current HDFS. But in real 
> applications, it is common to transfer data bi-directionally for read/write 
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and 
> dynamic mount support for both read & write. Please note in the community 
> several JIRAs have been filed for these topics. Our work is based on these 
> previous community work, with new design & implementation to support called 
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate 
> those folks in the community for their great contribution! See their pending 
> JIRAs: HDFS-14805 & HDFS-12090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581504
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 03:10
Start Date: 13/Apr/21 03:10
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #2868:
URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818398223


   HDFS-15940 has fixed TestBlockRecovery


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581504)
Time Spent: 10h  (was: 9h 50m)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581501
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 02:46
Start Date: 13/Apr/21 02:46
Worklog Time Spent: 10m 
  Work Description: jojochuang merged pull request #2868:
URL: https://github.com/apache/hadoop/pull/2868


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581501)
Time Spent: 9h 40m  (was: 9.5h)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581502
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 02:46
Start Date: 13/Apr/21 02:46
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on pull request #2868:
URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818391251


   Thanks! @ferhui 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581502)
Time Spent: 9h 50m  (was: 9h 40m)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15759:
---
Fix Version/s: 3.2.3

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.2.3
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319824#comment-17319824
 ] 

Hui Fei commented on HDFS-15423:


Hi compile fails on trunk. Maybe it's related to this.

RouterWebHdfsMethods#chooseDatanode
{code:java}
resolvedNs = rpcServer.getCreateLocation(path).getNameserviceId();
{code}
But getCreateLocation is defined as this, it is used with wrong arguments.
{code:java}

RemoteLocation getCreateLocation(
final String src, final List locations)
throws IOException {
...
}{code}

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei reopened HDFS-15423:


> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581492
 ]

ASF GitHub Bot logged work on HDFS-15759:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 02:01
Start Date: 13/Apr/21 02:01
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #2868:
URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818375830


   @jojochuang Thanks.
   Failed tests are unrelated, They passed locally except TestBlockRecovery. 
And TestBlockRecovery fails without this PR, so i think it's not related to 
this PR. I will check it on trunk.
   +1 for this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581492)
Time Spent: 9.5h  (was: 9h 20m)

> EC: Verify EC reconstruction correctness on DataNode
> 
>
> Key: HDFS-15759
> URL: https://issues.apache.org/jira/browse/HDFS-15759
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, ec, erasure-coding
>Affects Versions: 3.4.0
>Reporter: Toshihiko Uchida
>Assignee: Toshihiko Uchida
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> EC reconstruction on DataNode has caused data corruption: HDFS-14768, 
> HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and 
> the corruption is neither detected nor auto-healed by HDFS. It is obviously 
> hard for users to monitor data integrity by themselves, and even if they find 
> corrupted data, it is difficult or sometimes impossible to recover them.
> To prevent further data corruption issues, this feature proposes a simple and 
> effective way to verify EC reconstruction correctness on DataNode at each 
> reconstruction process.
> It verifies correctness of outputs decoded from inputs as follows:
> 1. Decoding an input with the outputs;
> 2. Compare the decoded input with the original input.
> For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs 
> [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from 
> [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0.
> When an EC reconstruction task goes wrong, the comparison will fail with high 
> probability.
> Then the task will also fail and be retried by NameNode.
> The next reconstruction will succeed if the condition triggered the failure 
> is gone.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581491
 ]

ASF GitHub Bot logged work on HDFS-15714:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 01:58
Start Date: 13/Apr/21 01:58
Worklog Time Spent: 10m 
  Work Description: Zhangshunyu edited a comment on pull request #2655:
URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818374335


   Currentlly, alias map is based on LevelDB, and it does not support namenode 
HA, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581491)
Time Spent: 2h 20m  (was: 2h 10m)

> HDFS Provided Storage Read/Write Mount Support On-the-fly
> -
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15714-01.patch, 
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. 
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through 
> configuring external storage with PROVIDED tag for DataNode, user can enable 
> application to access data stored externally from HDFS side. However, there 
> are two issues need to be addressed. Firstly, mounting external storage 
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it 
> supported to flexibly combine HDFS with an external storage at runtime. 
> Secondly, PS write is not supported by current HDFS. But in real 
> applications, it is common to transfer data bi-directionally for read/write 
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and 
> dynamic mount support for both read & write. Please note in the community 
> several JIRAs have been filed for these topics. Our work is based on these 
> previous community work, with new design & implementation to support called 
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate 
> those folks in the community for their great contribution! See their pending 
> JIRAs: HDFS-14805 & HDFS-12090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581490
 ]

ASF GitHub Bot logged work on HDFS-15714:
-

Author: ASF GitHub Bot
Created on: 13/Apr/21 01:57
Start Date: 13/Apr/21 01:57
Worklog Time Spent: 10m 
  Work Description: Zhangshunyu commented on pull request #2655:
URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818374335


   Currentlly, alias map is based on LevelDB, and it is not support namenode 
HA, right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581490)
Time Spent: 2h 10m  (was: 2h)

> HDFS Provided Storage Read/Write Mount Support On-the-fly
> -
>
> Key: HDFS-15714
> URL: https://issues.apache.org/jira/browse/HDFS-15714
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, namenode
>Affects Versions: 3.4.0
>Reporter: Feilong He
>Assignee: Feilong He
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15714-01.patch, 
> HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. 
> In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through 
> configuring external storage with PROVIDED tag for DataNode, user can enable 
> application to access data stored externally from HDFS side. However, there 
> are two issues need to be addressed. Firstly, mounting external storage 
> on-the-fly, namely dynamic mount, is lacking. It is necessary to get it 
> supported to flexibly combine HDFS with an external storage at runtime. 
> Secondly, PS write is not supported by current HDFS. But in real 
> applications, it is common to transfer data bi-directionally for read/write 
> between HDFS and external storage.
> Through this JIRA, we are presenting our work for PS write support and 
> dynamic mount support for both read & write. Please note in the community 
> several JIRAs have been filed for these topics. Our work is based on these 
> previous community work, with new design & implementation to support called 
> writeBack mount and enable admin to add any mount on-the-fly. We appreciate 
> those folks in the community for their great contribution! See their pending 
> JIRAs: HDFS-14805 & HDFS-12090.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15970) Print network topology on the web

2021-04-12 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-15970:
---
Summary: Print network topology on the web  (was: Print network topology on 
web)

> Print network topology on the web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319724#comment-17319724
 ] 

Fengnan Li commented on HDFS-15423:
---

Thanks [~elgoiri] [~ayushtkn] for the review! Let's see whether it can fix 
[HDFS-15878|https://issues.apache.org/jira/browse/HDFS-15878]

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-04-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319704#comment-17319704
 ] 

Ayush Saxena commented on HDFS-15660:
-

{quote}As version 3.1, 3.2 and 3.3 already contain the new storage type, it 
should be okay to do the upgrade. So I don't cherry-pick to other branches.
{quote}
 
 HDFS-15025 added a new storage type and that is in 3.4.0, so IMO we should 
cherry-pick this to 3.x branches? Else the problem what was faced for PROVIDED 
storage type will happen for this also and in case some new storage type is 
added in future for them also.

> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>         at 
> 

[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Jira


[ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319676#comment-17319676
 ] 

Íñigo Goiri commented on HDFS-15423:


Thanks [~fengnanli] for the improvement.
Merged PR 2605.

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6

2021-04-12 Thread Jim Brennan (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319677#comment-17319677
 ] 

Jim Brennan commented on HDFS-15660:


I share [~weichiu]'s confusion.   If this change was put in trunk and all 
branch-2 branches, I don't understand why we would skip branch-3.1, branch-3.2, 
and branch-3.3?  It may not be strictly needed, but shouldn't we keep the 
change consistent across branches?


> StorageTypeProto is not compatiable between 3.x and 2.6
> ---
>
> Key: HDFS-15660
> URL: https://issues.apache.org/jira/browse/HDFS-15660
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1
>Reporter: Ryan Wu
>Assignee: Ryan Wu
>Priority: Major
> Fix For: 2.9.3, 3.4.0, 2.10.2
>
> Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch
>
>
> In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6,  
> we found hive to call getContentSummary method , the client and server was 
> not compatible  because of hadoop3 added new PROVIDED storage type.
> {code:java}
> // code placeholder
> 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while 
> invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over 
> x/x:8020. Trying to fail over immediately.
> java.io.IOException: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713)
>         at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109)
>         at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317)
>         at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289)
>         at 
> org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
>         at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
>         at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)
> Caused by: com.google.protobuf.ServiceException: 
> com.google.protobuf.UninitializedMessageException: Message missing required 
> fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272)
>         at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816)
>         ... 23 more
> Caused by: com.google.protobuf.UninitializedMessageException: Message missing 
> required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type
>         at 
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65392)
>         at 
> 

[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581288
 ]

ASF GitHub Bot logged work on HDFS-15423:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 19:42
Start Date: 12/Apr/21 19:42
Worklog Time Spent: 10m 
  Work Description: goiri merged pull request #2605:
URL: https://github.com/apache/hadoop/pull/2605


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581288)
Time Spent: 6h 10m  (was: 6h)

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters

2021-04-12 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-15423.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: WebHDFS create shouldn't choose DN from all sub-clusters
> -
>
> Key: HDFS-15423
> URL: https://issues.apache.org/jira/browse/HDFS-15423
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf, webhdfs
>Reporter: Chao Sun
>Assignee: Fengnan Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} 
> first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from 
> the list via {{getRandomDatanode}}. This logic doesn't seem correct as it 
> should pick a DN for the specific cluster(s) of the input {{path}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15971) Make mkstemp cross platform

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15971?focusedWorklogId=581287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581287
 ]

ASF GitHub Bot logged work on HDFS-15971:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 19:42
Start Date: 12/Apr/21 19:42
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2898:
URL: https://github.com/apache/hadoop/pull/2898#issuecomment-818083196


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   2m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   2m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  mvnsite  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  53m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  cc  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  golang  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  cc  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  golang  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  13m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  31m 48s |  |  hadoop-hdfs-native-client in 
the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 107m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2898 |
   | Optional Tests | dupname asflicense compile cc mvnsite javac unit 
codespell golang |
   | uname | Linux 939cc4c9ed63 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2b1da158404546225a694691400c5271d4f631ac |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/testReport/ |
   | Max. process+thread count | 713 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581287)
Time Spent: 20m  (was: 10m)

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: 

[jira] [Work logged] (HDFS-15970) Print network topology on web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581232
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 18:31
Start Date: 12/Apr/21 18:31
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#issuecomment-818033898


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 54s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m  5s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  16m 17s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  6s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 53s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 
1 fixed = 6 total (was 6)  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 44s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 10s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  15m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 233m 16s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 319m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys |
   |   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
   |   | hadoop.hdfs.qjournal.server.TestJournalNodeSync |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2896 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 3e68afa520ab 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 
05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | 

[jira] [Work logged] (HDFS-15971) Make mkstemp cross platform

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15971?focusedWorklogId=581199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581199
 ]

ASF GitHub Bot logged work on HDFS-15971:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 17:53
Start Date: 12/Apr/21 17:53
Worklog Time Spent: 10m 
  Work Description: GauthamBanasandra opened a new pull request #2898:
URL: https://github.com/apache/hadoop/pull/2898


   * mkstemp isn't available in Visual C++.
 This PR implements the necessary
 cross platform implementation for
 mkstemp.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581199)
Remaining Estimate: 0h
Time Spent: 10m

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15971) Make mkstemp cross platform

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15971:
--
Labels: pull-request-available  (was: )

> Make mkstemp cross platform
> ---
>
> Key: HDFS-15971
> URL: https://issues.apache.org/jira/browse/HDFS-15971
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++
>Affects Versions: 3.4.0
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15971) Make mkstemp cross platform

2021-04-12 Thread Gautham Banasandra (Jira)
Gautham Banasandra created HDFS-15971:
-

 Summary: Make mkstemp cross platform
 Key: HDFS-15971
 URL: https://issues.apache.org/jira/browse/HDFS-15971
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: libhdfs++
Affects Versions: 3.4.0
Reporter: Gautham Banasandra
Assignee: Gautham Banasandra


mkstemp isn't available in Visual C++. Need to make it cross platform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=581140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581140
 ]

ASF GitHub Bot logged work on HDFS-15785:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 16:50
Start Date: 12/Apr/21 16:50
Worklog Time Spent: 10m 
  Work Description: fengnanli commented on a change in pull request #2639:
URL: https://github.com/apache/hadoop/pull/2639#discussion_r574117112



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java
##
@@ -647,6 +634,58 @@ public static String addKeySuffixes(String key, String... 
suffixes) {
   getNNLifelineRpcAddressesForCluster(Configuration conf)
   throws IOException {
 
+Collection parentNameServices = getParentNameServices(conf);
+
+return getAddressesForNsIds(conf, parentNameServices, null,
+DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY);
+  }
+
+  //
+  /**
+   * Returns the configured address for all NameNodes in the cluster.
+   * This is similar with DFSUtilClient.getAddressesForNsIds()
+   * but can access DFSConfigKeys.
+   *
+   * @param conf configuration
+   * @param defaultAddress default address to return in case key is not found.
+   * @param keys Set of keys to look for in the order of preference
+   *
+   * @return a map(nameserviceId to map(namenodeId to InetSocketAddress))
+   */
+  static Map> getAddressesForNsIds(

Review comment:
   Can we try this to reduce the code duplicity? Override function 
`DFSUtilClient.getAddressesForNsIds()` by adding a boolean var indicating 
whether to resolve (the var is fetched from the config). 
   Inside the `DFSUtilClient.getAddressesForNameserviceId`, add another 
override with the boolean, make the current one with the value false. If the 
var is true, do the DNS resolving and return addresses.

##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
##
@@ -1557,6 +1557,17 @@
   public static final double
   DFS_DATANODE_RESERVE_FOR_ARCHIVE_DEFAULT_PERCENTAGE_DEFAULT = 0.0;
 
+
+  public static final String
+  DFS_NAMESERVICES_RESOLUTION_ENABLED =

Review comment:
   If we maintain only one config across nn, qjm, zkfc and dn, this is an 
issue since the other three don't support DNS yet. I am thinking about how to 
do it for now and it requires some refactor in places such as 
`DFSUtil.getSuffixIDs` (used by zkfc). I will follow up on this soon. ATM we 
can keep a separate config for DN only as a short term solution.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581140)
Time Spent: 2.5h  (was: 2h 20m)

> Datanode to support using DNS to resolve nameservices to IP addresses to get 
> list of namenodes
> --
>
> Key: HDFS-15785
> URL: https://issues.apache.org/jira/browse/HDFS-15785
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently as HDFS supports observers, multiple-standby and router, the 
> namenode hosts are changing frequently in large deployment, we can consider 
> supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to 
> reduce the need to update config frequently on all datanodes. In that case, 
> datanode and clients can use the same set of config as well.
> Basically we can resolve the DNS and generate namenode for each IP behind it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk

2021-04-12 Thread Fengnan Li (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319573#comment-17319573
 ] 

Fengnan Li commented on HDFS-15878:
---

Let's wait after HDFS-15423 is committed. Thanks.

> Flaky test 
> TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in 
> Trunk
> ---
>
> Key: HDFS-15878
> URL: https://issues.apache.org/jira/browse/HDFS-15878
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, rbf
>Reporter: Renukaprasad C
>Assignee: Fengnan Li
>Priority: Major
>
> ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: 
> 24.627 s <<< FAILURE! - in 
> org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate
> [ERROR] 
> testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate)
>   Time elapsed: 0.222 s  <<< ERROR!
> java.io.FileNotFoundException: File /test/testSyncable not found.
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>   at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556)
>   at 
> org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> /test/testSyncable not found.
>   at 
> 

[jira] [Work logged] (HDFS-15970) Print network topology on web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581108
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 16:00
Start Date: 12/Apr/21 16:00
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896#issuecomment-817930124


   Can you extend this to the rbf ui as well? the federationhealth.html and 
federationhealth.js


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 581108)
Time Spent: 20m  (was: 10m)

> Print network topology on web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15956) Provide utility class for FSNamesystem

2021-04-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-15956:

Labels:   (was: pull-request-available)

> Provide utility class for FSNamesystem
> --
>
> Key: HDFS-15956
> URL: https://issues.apache.org/jira/browse/HDFS-15956
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> With ever-growing functionalities, FSNamesystem has become very huge (with 
> ~9k lines of code) over a period of time, we should provide a utility class 
> and refactor as many basic utility functions to new class as we can.
> With any further suggestions, we can create sub-tasks of this Jira and work 
> on them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HDFS-15965) Please upgrade the log4j dependency to log4j2

2021-04-12 Thread Viraj Jasani (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HDFS-15965:

Comment: was deleted

(was: This is already being discussed on HADOOP-16206)

> Please upgrade the log4j dependency to log4j2
> -
>
> Key: HDFS-15965
> URL: https://issues.apache.org/jira/browse/HDFS-15965
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> The log4j dependency being use by hadoop-common is currently version 1.2.17. 
> Our fortify scan picked up a couple of issue with this dependency. Please 
> update it to the latest version of log4j2 dependencies:
> 
>  org.apache.logging.log4j
>  log4j-api
>  2.14.1
> 
> 
>  org.apache.logging.log4j
>  log4j-core
>  2.14.1
> 
>  
> The slf4j dependency will need to be updated as well after you upgrade log4j 
> to log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15970) Print network topology on web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15970:
--
Labels: pull-request-available  (was: )

> Print network topology on web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
>  Labels: pull-request-available
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15970) Print network topology on web

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=580974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580974
 ]

ASF GitHub Bot logged work on HDFS-15970:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 13:11
Start Date: 12/Apr/21 13:11
Worklog Time Spent: 10m 
  Work Description: tomscut opened a new pull request #2896:
URL: https://github.com/apache/hadoop/pull/2896


   JIRA: [HDFS-15970](https://issues.apache.org/jira/browse/HDFS-15970)
   
   In order to query the network topology information conveniently, we can 
print it on the web.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580974)
Remaining Estimate: 0h
Time Spent: 10m

> Print network topology on web
> -
>
> Key: HDFS-15970
> URL: https://issues.apache.org/jira/browse/HDFS-15970
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: tomscut
>Assignee: tomscut
>Priority: Minor
> Attachments: hdfs-topology.jpg, hdfs-web.jpg
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In order to query the network topology information conveniently, we can print 
> it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15970) Print network topology on web

2021-04-12 Thread tomscut (Jira)
tomscut created HDFS-15970:
--

 Summary: Print network topology on web
 Key: HDFS-15970
 URL: https://issues.apache.org/jira/browse/HDFS-15970
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: tomscut
Assignee: tomscut
 Attachments: hdfs-topology.jpg, hdfs-web.jpg

In order to query the network topology information conveniently, we can print 
it on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15965) Please upgrade the log4j dependency to log4j2

2021-04-12 Thread Viraj Jasani (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319447#comment-17319447
 ] 

Viraj Jasani commented on HDFS-15965:
-

This is already being discussed on HADOOP-16206

> Please upgrade the log4j dependency to log4j2
> -
>
> Key: HDFS-15965
> URL: https://issues.apache.org/jira/browse/HDFS-15965
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient
>Affects Versions: 3.3.0, 3.2.1, 3.2.2
>Reporter: helen huang
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> The log4j dependency being use by hadoop-common is currently version 1.2.17. 
> Our fortify scan picked up a couple of issue with this dependency. Please 
> update it to the latest version of log4j2 dependencies:
> 
>  org.apache.logging.log4j
>  log4j-api
>  2.14.1
> 
> 
>  org.apache.logging.log4j
>  log4j-core
>  2.14.1
> 
>  
> The slf4j dependency will need to be updated as well after you upgrade log4j 
> to log4j2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15969) DFSClient prints token information a string format

2021-04-12 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319416#comment-17319416
 ] 

Hadoop QA commented on HDFS-15969:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 
42s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 37s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 
49s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  2m 
49s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
55s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
1s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 56s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| 

[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog

2021-04-12 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319413#comment-17319413
 ] 

tomscut commented on HDFS-15175:


Hi [~max2049],thanks for your work. The test case you provide can reflect this 
problem. But can we reproduce the issue by calling the relevant APIs 
(create/close/truncate)?

> Multiple CloseOp shared block instance causes the standby namenode to crash 
> when rolling editlog
> 
>
> Key: HDFS-15175
> URL: https://issues.apache.org/jira/browse/HDFS-15175
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.2
>Reporter: Yicong Cai
>Assignee: Wan Chang
>Priority: Critical
>  Labels: NameNode
> Attachments: HDFS-15175-trunk.1.patch
>
>
>  
> {panel:title=Crash exception}
> 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log 
> tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp 
> [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, 
> atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], 
> permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, 
> clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, 
> txid=32625024993]
>  java.io.IOException: File is not under construction: ..
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891)
>  at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:360)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361)
> {panel}
>  
> {panel:title=Editlog}
> 
>  OP_REASSIGN_LEASE
>  
>  32625021150
>  DFSClient_NONMAPREDUCE_-969060727_197760
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625023743
>  0
>  0
>  ..
>  3
>  1581816135883
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> ..
> 
>  OP_TRUNCATE
>  
>  32625024049
>  ..
>  DFSClient_NONMAPREDUCE_1000868229_201260
>  ..
>  185818644
>  1581816136336
>  
>  5568434562
>  185818648
>  4495417845
>  
>  
>  
> ..
> 
>  OP_CLOSE
>  
>  32625024993
>  0
>  0
>  ..
>  3
>  1581816138774
>  1581814760398
>  536870912
>  
>  
>  false
>  
>  5568434562
>  185818644
>  4495417845
>  
>  
>  da_music
>  hdfs
>  416
>  
>  
>  
> {panel}
>  
>  
> The block size should be 185818648 in the first CloseOp. When truncate is 
> used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is 
> synchronized to the JournalNode in the same batch. The block used by CloseOp 
> twice is the same instance, which causes the first CloseOp has wrong block 
> size. When SNN rolling Editlog, TruncateOp does not make the file to the 
> UnderConstruction state. Then, when the second CloseOp is executed, the file 
> is not in the UnderConstruction state, and SNN crashes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1

2021-04-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-15964:

Fix Version/s: (was: 3.4.0)
   (was: 3.3.0)

> Please update the okhttp version to 4.9.1
> -
>
> Key: HDFS-15964
> URL: https://issues.apache.org/jira/browse/HDFS-15964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, dfsclient, security
>Affects Versions: 3.3.0
>Reporter: helen huang
>Priority: Major
>
> Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan 
> flagged two issues with this version. Please update it to the latest (It is 
> okhttp3 4.9.1 at this point). Thanks!
> 
>  com.squareup.okhttp3
>  okhttp
>  4.9.1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15964) Please update the okhttp version to 4.9.1

2021-04-12 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319394#comment-17319394
 ] 

Steve Loughran commented on HDFS-15964:
---

changes like this should be submitted as github PRs. As this changes hdfs too, 
to ensure yetus does the hdfs build/test the PR needs to make some (any) change 
in the HDFS module. Adding a newline to the hdfs pom should be enough -we won't 
merge that.

Be aware: changing dependencies are some of the most traumatic changes we can 
make. A single "change a line in a maven build" can break tests, cause 
downstream incompatibilities, trigger regressions in deployments which don't 
surface in unit tests etc etc.

There is never a *just* update a JAR. It's "update the JAR, see what breaks, 
come up with a plan/timetable to fix". This one should be low risk. But things 
related to: guava, jackson, log4j are project-spanning minefields. T

Further reading 
http://steveloughran.blogspot.com/2016/05/fear-of-dependencies.html

> Please update the okhttp version to 4.9.1
> -
>
> Key: HDFS-15964
> URL: https://issues.apache.org/jira/browse/HDFS-15964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, dfsclient, security
>Affects Versions: 3.3.0
>Reporter: helen huang
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan 
> flagged two issues with this version. Please update it to the latest (It is 
> okhttp3 4.9.1 at this point). Thanks!
> 
>  com.squareup.okhttp3
>  okhttp
>  4.9.1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1

2021-04-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HDFS-15964:
--
Component/s: security
 build

> Please update the okhttp version to 4.9.1
> -
>
> Key: HDFS-15964
> URL: https://issues.apache.org/jira/browse/HDFS-15964
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, dfsclient, security
>Affects Versions: 3.3.0
>Reporter: helen huang
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan 
> flagged two issues with this version. Please update it to the latest (It is 
> okhttp3 4.9.1 at this point). Thanks!
> 
>  com.squareup.okhttp3
>  okhttp
>  4.9.1
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15969) DFSClient prints token information a string format

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15969:

Status: Patch Available  (was: Open)

> DFSClient prints token information a string format 
> ---
>
> Key: HDFS-15969
> URL: https://issues.apache.org/jira/browse/HDFS-15969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15969.001.patch
>
>
> DFSclient prints token information in a string format, as this is sensitive 
> information it must be moved to debug level or can be exempted even from 
> debug level



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15969) DFSClient prints token information a string format

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15969:

Attachment: HDFS-15969.001.patch

> DFSClient prints token information a string format 
> ---
>
> Key: HDFS-15969
> URL: https://issues.apache.org/jira/browse/HDFS-15969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15969.001.patch
>
>
> DFSclient prints token information in a string format, as this is sensitive 
> information it must be moved to debug level or can be exempted even from 
> debug level



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15967) Improve the log for Short Circuit Local Reads

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15967:

Status: Patch Available  (was: In Progress)

> Improve the log for Short Circuit Local Reads
> -
>
> Key: HDFS-15967
> URL: https://issues.apache.org/jira/browse/HDFS-15967
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15967.001.patch
>
>
> Improve the log for Short Circuit Local Reads 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15968) Improve the log for The DecayRpcScheduler

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15968:

Status: Patch Available  (was: In Progress)

> Improve the log for The DecayRpcScheduler 
> --
>
> Key: HDFS-15968
> URL: https://issues.apache.org/jira/browse/HDFS-15968
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15968.001.patch
>
>
> Improve the log for The DecayRpcScheduler to make use of the SELF4j logger 
> factory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15968) Improve the log for The DecayRpcScheduler

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15968:

Attachment: HDFS-15968.001.patch

> Improve the log for The DecayRpcScheduler 
> --
>
> Key: HDFS-15968
> URL: https://issues.apache.org/jira/browse/HDFS-15968
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15968.001.patch
>
>
> Improve the log for The DecayRpcScheduler to make use of the SELF4j logger 
> factory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15967) Improve the log for Short Circuit Local Reads

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bhavik Patel updated HDFS-15967:

Attachment: HDFS-15967.001.patch

> Improve the log for Short Circuit Local Reads
> -
>
> Key: HDFS-15967
> URL: https://issues.apache.org/jira/browse/HDFS-15967
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15967.001.patch
>
>
> Improve the log for Short Circuit Local Reads 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15969) DFSClient prints token information a string format

2021-04-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319333#comment-17319333
 ] 

Ayush Saxena commented on HDFS-15969:
-

there are a bunch of jiras removing this itself from debug, so you can chunk 
this of

> DFSClient prints token information a string format 
> ---
>
> Key: HDFS-15969
> URL: https://issues.apache.org/jira/browse/HDFS-15969
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
>
> DFSclient prints token information in a string format, as this is sensitive 
> information it must be moved to debug level or can be exempted even from 
> debug level



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15969) DFSClient prints token information a string format

2021-04-12 Thread Bhavik Patel (Jira)
Bhavik Patel created HDFS-15969:
---

 Summary: DFSClient prints token information a string format 
 Key: HDFS-15969
 URL: https://issues.apache.org/jira/browse/HDFS-15969
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Bhavik Patel
Assignee: Bhavik Patel


DFSclient prints token information in a string format, as this is sensitive 
information it must be moved to debug level or can be exempted even from debug 
level



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319274#comment-17319274
 ] 

Ayush Saxena commented on HDFS-15614:
-

Thanx [~shashikant] for the responses, I will bother you a bit more:
{quote}If a new directory is made snapshottable with feature flahg turned , 
.Trash directory gets created implicitly as a part of allowSnapshot call. I 
don't think there is an ambiguity here.
{quote}
I think I cleared this up in my question itself, So here is a test for that:
{code:java}
  @Test
  public void testClientAmbiguity() throws Exception {
Configuration conf = new HdfsConfiguration();
// Enable the feature
conf.setBoolean("dfs.namenode.snapshot.trashroot.enabled", true);
try (MiniDFSCluster cluster =
new MiniDFSCluster.Builder(conf).build()) {
  cluster.waitActive();
  final DistributedFileSystem dfs = cluster.getFileSystem();

  // Create two directories, on 1 allowSnapshot through DFS & on other
  // from HDFS ADMIN

  Path dir1 = new Path("/dir1");
  Path dir2 = new Path("/dir2");
  dfs.mkdirs(dir1);
  dfs.mkdirs(dir2);

  // AllowSnapshot on dir1 through dfs

  dfs.allowSnapshot(dir1);

  // AllowSnapshot on dir2 through dfsadmin
  DFSAdmin dfsAdmin = new DFSAdmin(conf);

  ToolRunner.run(conf, dfsAdmin,new String[]{"-allowSnapshot",
  dir2.toString()});

  // Check for trash directory in dir1(allowed through dfs)
  assertFalse(dfs.exists(new Path(dir1,FileSystem.TRASH_PREFIX))); // (1)


  // Check for trash directory in dir2(allowed through DfsAdmin)
  assertTrue(dfs.exists(new Path(dir2,FileSystem.TRASH_PREFIX)));

  // Failover/Restart namenode and stuff

  cluster.restartNameNodes();

  cluster.waitActive();

  // Nothing should change

  // Check for trash directory in dir1(allowed through dfs)
  // Will fail here. stuff changed post restart of namenode. Such thing
  // will happen with uupgrade as well.
  assertFalse(dfs.exists(new Path(dir1,FileSystem.TRASH_PREFIX))); // (1)

  assertTrue(dfs.exists(new Path(dir2,FileSystem.TRASH_PREFIX)));
}
  }
{code}
And this fails, And yep there is an ambiguity.
{quote}This is important for provisioning snapshot trash to use ordered 
snapshot deletion feature if the system already had pre existing snapshottable 
directories.
{quote}
How come a client side feature that important, that can make the cluster go 
down in times of critical situation like failover, Again a test to show that:
{code:java}
 @Test
  public void testFailureAfterFailoverOrRestart() throws Exception {
Configuration conf = new HdfsConfiguration();
// Enable the feature
conf.setBoolean("dfs.namenode.snapshot.trashroot.enabled", true);
try (MiniDFSCluster cluster =
new MiniDFSCluster.Builder(conf).build()) {
  cluster.waitActive();
  final DistributedFileSystem dfs = cluster.getFileSystem();

  // Create a directory
  Path dir1 = new Path("/dir1");
  dfs.mkdirs(dir1);


  // AllowSnapshot on dir1
  dfs.allowSnapshot(dir1);

  // SetQuota
  dfs.setQuota(dir1, 1, 1);

  // Check if the cluster is working and happy.
  dfs.mkdirs(new Path("/dir2"));
  assertTrue(dfs.exists(new Path("/dir2")));

  // Failover/Restart namenode or such stuff

  cluster.restartNameNodes(); // Namenode Crashed, It was failover, then
  // standby would also crash & ultimately whole of cluster.

   // Will not reach here itself. :-(
  cluster.waitActive();

  dfs.listStatus(new Path("/dir1"));
}
  }
{code}
{quote}The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It 
does not depend on the no of snapshots present in the system.
{quote}
Ok If you say, getAllSnapshottableDirs() might not be heavy, even if there are 
tons of snapshottable directories, So, getFileInfo for all these directories 
and then mkdirs for all these in worst case.
 So a normal scenario is like:
 1 call getAllSnapshottableDirs -> say fetches 2 million dirs
 2 million getFileInfo() calls -> say avg case 1 million doesn't have trash
 1 million mkdirs() -> a write call isn't considered cheap & fast you go to JNs 
and stuff.

If you get this creation of snapshots in the filesystem spec as well, still you 
won't get rid of any of these problems,

Nevertheless, what ever be the case a normal running cluster shouldn't crash 
due to any feature, and that too during failovers, that is some crazy stuff.

And regarding encryption zone stuff are you talking it is similar to 
HDFS-10324(I see this only linked on HDFS-15607)? Well I don't think it is 
doing create like stuff during startup. Will see if [~weichiu] can confirm 
that, He worked on it. Didn't dig in much though

Well not very sure of the use case and things here, so would leave it to you 
guys. Please don't hold anything for me in the 

[jira] [Work started] (HDFS-15968) Improve the log for The DecayRpcScheduler

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15968 started by Bhavik Patel.
---
> Improve the log for The DecayRpcScheduler 
> --
>
> Key: HDFS-15968
> URL: https://issues.apache.org/jira/browse/HDFS-15968
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
>
> Improve the log for The DecayRpcScheduler to make use of the SELF4j logger 
> factory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15968) Improve the log for The DecayRpcScheduler

2021-04-12 Thread Bhavik Patel (Jira)
Bhavik Patel created HDFS-15968:
---

 Summary: Improve the log for The DecayRpcScheduler 
 Key: HDFS-15968
 URL: https://issues.apache.org/jira/browse/HDFS-15968
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Bhavik Patel
Assignee: Bhavik Patel


Improve the log for The DecayRpcScheduler to make use of the SELF4j logger 
factory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15967) Improve the log for Short Circuit Local Reads

2021-04-12 Thread Bhavik Patel (Jira)
Bhavik Patel created HDFS-15967:
---

 Summary: Improve the log for Short Circuit Local Reads
 Key: HDFS-15967
 URL: https://issues.apache.org/jira/browse/HDFS-15967
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Bhavik Patel
Assignee: Bhavik Patel


Improve the log for Short Circuit Local Reads 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work started] (HDFS-15967) Improve the log for Short Circuit Local Reads

2021-04-12 Thread Bhavik Patel (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-15967 started by Bhavik Patel.
---
> Improve the log for Short Circuit Local Reads
> -
>
> Key: HDFS-15967
> URL: https://issues.apache.org/jira/browse/HDFS-15967
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
>
> Improve the log for Short Circuit Local Reads 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread zhanghuazong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhanghuazong resolved HDFS-15966.
-
Resolution: Fixed

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Assignee: zhanghuazong
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread zhanghuazong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhanghuazong reassigned HDFS-15966:
---

Assignee: zhanghuazong

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Assignee: zhanghuazong
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580847
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 09:27
Start Date: 12/Apr/21 09:27
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on pull request #2889:
URL: https://github.com/apache/hadoop/pull/2889#issuecomment-817648020


   Failed junit tests have nothing to do with this PR. I ran them locally and 
all of them passed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580847)
Time Spent: 2h 20m  (was: 2h 10m)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-15966:
--
Labels: pull-request-available  (was: )

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?focusedWorklogId=580846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580846
 ]

ASF GitHub Bot logged work on HDFS-15966:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 09:26
Start Date: 12/Apr/21 09:26
Worklog Time Spent: 10m 
  Work Description: langlaile1221 opened a new pull request #2894:
URL: https://github.com/apache/hadoop/pull/2894


   …ant queue
   
   ## NOTICE
   
   Please create an issue in ASF JIRA before opening a pull request,
   and you need to set the title of the pull request which starts with
   the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.)
   For more details, please see 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580846)
Remaining Estimate: 0h
Time Spent: 10m

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread zhanghuazong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhanghuazong updated HDFS-15966:

Description: Clear the two indicators 
highestPriorityLowRedundancyReplicatedBlocks and 
highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant queue. 
 (was: Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks 
and highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
queue,)

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Priority: Minor
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue

2021-04-12 Thread zhanghuazong (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhanghuazong updated HDFS-15966:

Summary: Empty the statistical parameters when emptying the redundant queue 
 (was: Empty the statistical parameters when emptying the redundant queue, )

> Empty the statistical parameters when emptying the redundant queue
> --
>
> Key: HDFS-15966
> URL: https://issues.apache.org/jira/browse/HDFS-15966
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: zhanghuazong
>Priority: Minor
>
> Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
> highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant 
> queue,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue,

2021-04-12 Thread zhanghuazong (Jira)
zhanghuazong created HDFS-15966:
---

 Summary: Empty the statistical parameters when emptying the 
redundant queue, 
 Key: HDFS-15966
 URL: https://issues.apache.org/jira/browse/HDFS-15966
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: zhanghuazong


Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and 
highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant queue,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168
 ] 

Shashikant Banerjee edited comment on HDFS-15614 at 4/12/21, 8:46 AM:
--

Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy 
call IMO. It does not depend on the no of snapshots present in the system.

 
{code:java}
1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes 
will try this stuff in an attempt to become active and come out of safemode. 
Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, 
I can tell you one which I tried: Namespace Quotas, and yep the namenode 
crashed. can be bunch of such cases
{code}
If mkdir fails to create the Trash directory , inside the snapshot root, then 
strict ordering/processing of all entries during snapshot deletion can not be 
guaranteed, If this feature needs to be used, .Trash needs to be within the 
snapshottable directory which is similar to the case with encryption zones.

 

 

 
{code:java}
2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin 
he didn't had any Trash directory in the snapshot dir, Suddenly a failover 
happened, he would get a trash directory in its snapshot directory, Which he 
never created.{code}
If a new directory is made snapshottable with feature flahg turned , .Trash 
directory gets created impliclitly as a part of allowSnapshot call. I don't 
think there is an ambiguity here.
{code:java}
Third, The time cost, The namenode startup or the namenode failover or let it 
be coming out of safemode should be fast, They are actually contributing to 
cluster down time, and here we are doing like first getSnapshottableDirs which 
itself would be a heavy call if you have a lot of snapshots, then for each 
directory, one by one we are doing a getFileInfo and then a mkdir, seems like 
time-consuming. Not sure about the memory consumption at that point due to this 
though...
{code}
I don't think getSnapshottableDirs() is a very heavey call in typical setups. 
It has nothing to do with the no of snapshots that exist in the sytem.
{code:java}
Fourth, Why the namenode needs to do a client operation? It is the server. And 
that too while starting up, This mkdirs from namenode to self is itself 
suspicious, Bunch of namenode crashing coming up trying to become active, 
trying to push same edits, Hopefully you would have taken that into account and 
pretty sure such things won't occur, Namenodes won't collide even in the rarest 
cases. yep and all safe with the permissions..
{code}
This is important for provisioning snapshot trash to use ordered snapshot 
deletion feature if the system already had pre existing snapshottable 
directories.

 


was (Author: shashikant):
Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy 
call IMO. It does not depend on the no of snapshots present in the system.

 
{code:java}
1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes 
will try this stuff in an attempt to become active and come out of safemode. 
Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, 
I can tell you one which I tried: Namespace Quotas, and yep the namenode 
crashed. can be bunch of such cases
{code}
If mkdir fails to create the Trash directory , inside the snapshot root, then 
strict ordering/processing of all entries during snapshot deletion can not be 
guaranteed, If this feature needs to be used, .Trash needs to be within the 
snapshottable directory which is similar to the case with encryption zones.

 

 

 
{code:java}
2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin 
he didn't had any Trash directory in the snapshot dir, Suddenly a failover 
happened, he would get a trash directory in its snapshot directory, Which he 
never created.{code}
If a new directory is made snapshottable with feature flahg turned , .Trash 
directory gets created impliclitly as a part of allowSnapshot call. I don't 
think there is an ambiguity here.
{code:java}
Third, The time cost, The namenode startup or the namenode failover or let it 
be coming out of safemode should be fast, They are actually contributing to 
cluster down time, and here we are doing like first getSnapshottableDirs which 
itself would be a heavy call if you have a lot of snapshots, then for each 
directory, one by one we are doing a getFileInfo and then a mkdir, seems like 
time-consuming. Not sure about the memory consumption at that point due to this 
though...
{code}
I don't think getSnapshottableDirs() is a very heavey call in typical setups. 
It has nothing to do with the no of snapshots that exist in the sytem.
{code:java}
Fourth, Why the namenode needs to do a client operation? It is the server. And 
that too while starting up, This mkdirs from namenode to 

[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled

2021-04-12 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168
 ] 

Shashikant Banerjee commented on HDFS-15614:


Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy 
call IMO. It does not depend on the no of snapshots present in the system.

 
{code:java}
1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes 
will try this stuff in an attempt to become active and come out of safemode. 
Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, 
I can tell you one which I tried: Namespace Quotas, and yep the namenode 
crashed. can be bunch of such cases
{code}
If mkdir fails to create the Trash directory , inside the snapshot root, then 
strict ordering/processing of all entries during snapshot deletion can not be 
guaranteed, If this feature needs to be used, .Trash needs to be within the 
snapshottable directory which is similar to the case with encryption zones.

 

 

 
{code:java}
2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin 
he didn't had any Trash directory in the snapshot dir, Suddenly a failover 
happened, he would get a trash directory in its snapshot directory, Which he 
never created.{code}
If a new directory is made snapshottable with feature flahg turned , .Trash 
directory gets created impliclitly as a part of allowSnapshot call. I don't 
think there is an ambiguity here.
{code:java}
Third, The time cost, The namenode startup or the namenode failover or let it 
be coming out of safemode should be fast, They are actually contributing to 
cluster down time, and here we are doing like first getSnapshottableDirs which 
itself would be a heavy call if you have a lot of snapshots, then for each 
directory, one by one we are doing a getFileInfo and then a mkdir, seems like 
time-consuming. Not sure about the memory consumption at that point due to this 
though...
{code}
I don't think getSnapshottableDirs() is a very heavey call in typical setups. 
It has nothing to do with the no of snapshots that exist in the sytem.
{code:java}
Fourth, Why the namenode needs to do a client operation? It is the server. And 
that too while starting up, This mkdirs from namenode to self is itself 
suspicious, Bunch of namenode crashing coming up trying to become active, 
trying to push same edits, Hopefully you would have taken that into account and 
pretty sure such things won't occur, Namenodes won't collide even in the rarest 
cases. yep and all safe with the permissions..
{code}
This is important for provisioning snapshot trash to use ordered snapshot 
deletion feature if the system already had pre existing snapshottable 
directories.

 

> Initialize snapshot trash root during NameNode startup if enabled
> -
>
> Key: HDFS-15614
> URL: https://issues.apache.org/jira/browse/HDFS-15614
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This is a follow-up to HDFS-15607.
> Goal:
> Initialize (create) snapshot trash root for all existing snapshottable 
> directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to 
> {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually 
> on all those existing snapshottable directories.
> The change is expected to land in {{FSNamesystem}}.
> Discussion:
> 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the 
> client side. But in order for NN to create it at startup, the logic must 
> (also) be implemented on the server side as well. -- which is also a 
> requirement by WebHDFS (HDFS-15612).
> 2. Alternatively, we can provide an extra parameter to the 
> {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to 
> initialize/provision trash root on all existing snapshottable dirs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580826
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 08:16
Start Date: 12/Apr/21 08:16
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611418779



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1805,4 +1806,38 @@ public void testNotifyNamenodeMissingOrNewBlock() throws 
Exception {
   cluster.shutdown();
 }
   }
+
+  @Test

Review comment:
   Thanks, I'll add it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580826)
Time Spent: 2h 10m  (was: 2h)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580819
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 07:59
Start Date: 12/Apr/21 07:59
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611407579



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java
##
@@ -562,4 +565,57 @@ void writeBlock(ExtendedBlock block, 
BlockConstructionStage stage,
 checksum, CachingStrategy.newDefaultStrategy(), false, false,
 null, null, new String[0]);
   }
+
+  @Test
+  public void testReleaseVolumeRefIfExceptionThrown() throws IOException {
+Path file = new Path("dataprotocol.dat");
+int numDataNodes = 1;
+
+Configuration conf = new HdfsConfiguration();
+conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, numDataNodes);
+MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes(
+numDataNodes).build();
+try {
+  cluster.waitActive();
+  datanode = cluster.getFileSystem().getDataNodeStats(
+  DatanodeReportType.LIVE)[0];
+  dnAddr = NetUtils.createSocketAddr(datanode.getXferAddr());
+  FileSystem fileSys = cluster.getFileSystem();
+
+  int fileLen = Math.min(
+  conf.getInt(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 4096), 4096);
+
+  DFSTestUtil.createFile(fileSys, file, fileLen, fileLen,
+  fileSys.getDefaultBlockSize(file),
+  fileSys.getDefaultReplication(file), 0L);
+
+  // get the first blockid for the file
+  final ExtendedBlock firstBlock = DFSTestUtil.getFirstBlock(fileSys, 
file);
+
+  String bpid = cluster.getNamesystem().getBlockPoolId();
+  ExtendedBlock blk = new ExtendedBlock(bpid, firstBlock.getLocalBlock());
+  sendBuf.reset();
+  recvBuf.reset();
+
+  // delete the meta file to create a exception in BlockSender constructor
+  DataNode dn = cluster.getDataNodes().get(0);
+  cluster.getMaterializedReplica(0, blk).deleteMeta();
+
+  FsVolumeImpl volume = (FsVolumeImpl) DataNodeTestUtils.getFSDataset(
+  dn).getVolume(blk);
+  int beforeCnt = volume.getReferenceCount();
+
+  sender.copyBlock(blk, BlockTokenSecretManager.DUMMY_TOKEN);
+  sendRecvData("Copy a block.", false);
+  Thread.sleep(1000);
+
+  int afterCnt = volume.getReferenceCount();
+  assertEquals(beforeCnt, afterCnt);

Review comment:
   I confirmed that this case has been handled. The reference will be 
closed when we close the corresponding BlockSender.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580819)
Time Spent: 2h  (was: 1h 50m)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This 

[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580817
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 07:49
Start Date: 12/Apr/21 07:49
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611401338



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java
##
@@ -167,18 +167,26 @@ synchronized long countPendingDeletions() {
* Execute the task sometime in the future, using ThreadPools.
*/
   synchronized void execute(FsVolumeImpl volume, Runnable task) {
-if (executors == null) {
-  throw new RuntimeException("AsyncDiskService is already shutdown");
-}
-if (volume == null) {
-  throw new RuntimeException("A null volume does not have a executor");
-}
-ThreadPoolExecutor executor = executors.get(volume.getStorageID());
-if (executor == null) {
-  throw new RuntimeException("Cannot find volume " + volume
-  + " for execution of task " + task);
-} else {
-  executor.execute(task);
+try {

Review comment:
   The clean up code is in the finally block, so it will be executed even 
if an exception occurs. Thanks for your suggestions, I will fix it to make the 
style consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580817)
Time Spent: 1h 50m  (was: 1h 40m)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580810
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 07:34
Start Date: 12/Apr/21 07:34
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611391944



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
##
@@ -432,6 +432,7 @@
   ris = new ReplicaInputStreams(
   blockIn, checksumIn, volumeRef, fileIoProvider);
 } catch (IOException ioe) {
+  IOUtils.cleanupWithLogger(null, volumeRef);

Review comment:
   If there are no exceptions, the reference will be closed when its 
BlockSender is closed. I checked that the code that constructs the BlockSender 
closed it after it was used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580810)
Time Spent: 1h 40m  (was: 1.5h)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580803
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 07:23
Start Date: 12/Apr/21 07:23
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611385242



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java
##
@@ -153,16 +154,24 @@ synchronized boolean queryVolume(FsVolumeImpl volume) {
* Execute the task sometime in the future, using ThreadPools.
*/
   synchronized void execute(String storageId, Runnable task) {
-if (executors == null) {
-  throw new RuntimeException(
-  "AsyncLazyPersistService is already shutdown");
-}
-ThreadPoolExecutor executor = executors.get(storageId);
-if (executor == null) {
-  throw new RuntimeException("Cannot find root storage volume with id " +
-  storageId + " for execution of task " + task);
-} else {
-  executor.execute(task);
+try {

Review comment:
   Yes, it is. But only when the task is not executed, an RuntimeException 
will be caught here. Once the task is executes the  try ... with block, all 
exceptions will be caught there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580803)
Time Spent: 1.5h  (was: 1h 20m)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop

2021-04-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580801
 ]

ASF GitHub Bot logged work on HDFS-15963:
-

Author: ASF GitHub Bot
Created on: 12/Apr/21 07:15
Start Date: 12/Apr/21 07:15
Worklog Time Spent: 10m 
  Work Description: zhangshuyan0 commented on a change in pull request 
#2889:
URL: https://github.com/apache/hadoop/pull/2889#discussion_r611380793



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java
##
@@ -1805,4 +1806,38 @@ public void testNotifyNamenodeMissingOrNewBlock() throws 
Exception {
   cluster.shutdown();
 }
   }
+
+  @Test
+  public void testReleaseVolumeRefIfExceptionThrown() throws IOException {
+MiniDFSCluster cluster = new MiniDFSCluster.Builder(
+new HdfsConfiguration()).build();
+cluster.waitActive();
+FsVolumeImpl vol = (FsVolumeImpl) dataset.getFsVolumeReferences().get(0);
+ExtendedBlock eb;
+ReplicaInfo info;
+int beforeCnt = 0;
+try {
+  List blockList = new ArrayList();
+  eb = new ExtendedBlock(BLOCKPOOL, 1, 1, 1001);
+  info = new FinalizedReplica(
+  eb.getLocalBlock(), vol, vol.getCurrentDir().getParentFile());
+  dataset.volumeMap.add(BLOCKPOOL, info);
+  ((LocalReplica) info).getBlockFile().createNewFile();
+  ((LocalReplica) info).getMetaFile().createNewFile();
+  blockList.add(info);
+
+  // Create a runtime exception
+  dataset.asyncDiskService.shutdown();
+
+  beforeCnt = vol.getReferenceCount();
+  dataset.invalidate(BLOCKPOOL, blockList.toArray(new Block[0]));
+
+} catch (RuntimeException re) {
+  int afterCnt = vol.getReferenceCount();
+  assertEquals(beforeCnt, afterCnt);
+  re.printStackTrace();

Review comment:
   Ok, I'll remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 580801)
Time Spent: 1h 20m  (was: 1h 10m)

> Unreleased volume references cause an infinite loop
> ---
>
> Key: HDFS-15963
> URL: https://issues.apache.org/jira/browse/HDFS-15963
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Shuyan Zhang
>Assignee: Shuyan Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, 
> HDFS-15963.003.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When BlockSender throws an exception because the meta-data cannot be found, 
> the volume reference obtained by the thread is not released, which causes the 
> thread trying to remove the volume to wait and fall into an infinite loop.
> {code:java}
> boolean checkVolumesRemoved() {
>   Iterator it = volumesBeingRemoved.iterator();
>   while (it.hasNext()) {
> FsVolumeImpl volume = it.next();
> if (!volume.checkClosed()) {
>   return false;
> }
> it.remove();
>   }
>   return true;
> }
> boolean checkClosed() {
>   // always be true.
>   if (this.reference.getReferenceCount() > 0) {
> FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.",
> this, reference.getReferenceCount());
> return false;
>   }
>   return true;
> }
> {code}
> At the same time, because the thread has been holding checkDirsLock when 
> removing the volume, other threads trying to acquire the same lock will be 
> permanently blocked.
> Similar problems also occur in RamDiskAsyncLazyPersistService and 
> FsDatasetAsyncDiskService.
> This patch releases the three previously unreleased volume references.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org