[jira] [Work logged] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode
[ https://issues.apache.org/jira/browse/HDFS-15815?focusedWorklogId=581562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581562 ] ASF GitHub Bot logged work on HDFS-15815: - Author: ASF GitHub Bot Created on: 13/Apr/21 05:49 Start Date: 13/Apr/21 05:49 Worklog Time Spent: 10m Work Description: ayushtkn commented on pull request #2882: URL: https://github.com/apache/hadoop/pull/2882#issuecomment-818456966 Sure @jojochuang, Thanx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581562) Time Spent: 50m (was: 40m) > if required storageType are unavailable, log the failed reason during > choosing Datanode > > > Key: HDFS-15815 > URL: https://issues.apache.org/jira/browse/HDFS-15815 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, > HDFS-15815.003.patch > > Time Spent: 50m > Remaining Estimate: 0h > > For better debug, if required storageType are unavailable, log the failed > reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15912: -- Labels: pull-request-available (was: ) > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15912) Allow ProtobufRpcEngine to be extensible
[ https://issues.apache.org/jira/browse/HDFS-15912?focusedWorklogId=581560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581560 ] ASF GitHub Bot logged work on HDFS-15912: - Author: ASF GitHub Bot Created on: 13/Apr/21 05:41 Start Date: 13/Apr/21 05:41 Worklog Time Spent: 10m Work Description: hchaverri opened a new pull request #2901: URL: https://github.com/apache/hadoop/pull/2901 ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581560) Remaining Estimate: 0h Time Spent: 10m > Allow ProtobufRpcEngine to be extensible > > > Key: HDFS-15912 > URL: https://issues.apache.org/jira/browse/HDFS-15912 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Hector Sandoval Chaverri >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ProtobufRpcEngine class doesn't allow for new RpcEngine implementations > to extend some of its inner classes (e.g. Invoker and > Server.ProtoBufRpcInvoker). Also, some of its methods are long enough such > that overriding them would result in a lot of code duplication (e.g. > Invoker#invoke and Server.ProtoBufRpcInvoker#call). > When implementing a new RpcEngine, it would be helpful to reuse most of the > code already in ProtobufRpcEngine. This would allow new fields to be added to > the RPC header or message with minimal code changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319891#comment-17319891 ] Fengnan Li commented on HDFS-15423: --- [~elgoiri] Sure, I will create a new one. > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15972) Fedbalance only copies data partially when there's existing opened file
[ https://issues.apache.org/jira/browse/HDFS-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319870#comment-17319870 ] Felix N commented on HDFS-15972: Hi [~LiJinglun], is this the expected behavior? During heavy write period, this might lead to data loss. > Fedbalance only copies data partially when there's existing opened file > --- > > Key: HDFS-15972 > URL: https://issues.apache.org/jira/browse/HDFS-15972 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Felix N >Priority: Major > > If there are opened files when fedbalance is run and data is being written to > these files, fedbalance might skip the newly written data. > Steps to recreate the issue: > # Create a dummy file /test/file with some data: {{echo "start" | hdfs dfs > -appendToFile /test/file}} > # Start writing to the file: {{hdfs dfs -appendToFile /test/file}} but do > not stop writing > # Run fedbalance: {{hadoop fedbalance submit hdfs://ns1/test > hdfs://ns2/test}} > # Write something to the file while fedbalance is running, "end" for > example, then stop writing > # After fedbalance is done, {{hdfs://ns2/test/file}} should only contain > "start" while {{hdfs://ns1/user/hadoop/.Trash/Current/test/file}} contains > "start\nend" > Fedbalance is run with default configs and arguments so no diff should happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Moved] (HDFS-15972) Fedbalance only copies data partially when there's existing opened file
[ https://issues.apache.org/jira/browse/HDFS-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix N moved HADOOP-17634 to HDFS-15972: - Key: HDFS-15972 (was: HADOOP-17634) Project: Hadoop HDFS (was: Hadoop Common) > Fedbalance only copies data partially when there's existing opened file > --- > > Key: HDFS-15972 > URL: https://issues.apache.org/jira/browse/HDFS-15972 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Felix N >Priority: Major > > If there are opened files when fedbalance is run and data is being written to > these files, fedbalance might skip the newly written data. > Steps to recreate the issue: > # Create a dummy file /test/file with some data: {{echo "start" | hdfs dfs > -appendToFile /test/file}} > # Start writing to the file: {{hdfs dfs -appendToFile /test/file}} but do > not stop writing > # Run fedbalance: {{hadoop fedbalance submit hdfs://ns1/test > hdfs://ns2/test}} > # Write something to the file while fedbalance is running, "end" for > example, then stop writing > # After fedbalance is done, {{hdfs://ns2/test/file}} should only contain > "start" while {{hdfs://ns1/user/hadoop/.Trash/Current/test/file}} contains > "start\nend" > Fedbalance is run with default configs and arguments so no diff should happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15970) Print network topology on the web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581523 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:51 Start Date: 13/Apr/21 03:51 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #2896: URL: https://github.com/apache/hadoop/pull/2896#discussion_r612107351 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java ## @@ -0,0 +1,115 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager; +import org.apache.hadoop.net.NetUtils; +import org.apache.hadoop.net.Node; +import org.apache.hadoop.net.NodeBase; +import org.apache.hadoop.util.StringUtils; + +import javax.servlet.ServletContext; +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; +import java.io.PrintStream; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.TreeSet; + +/** + * A servlet to print out the network topology. + */ +@InterfaceAudience.Private +public class NetworkTopologyServlet extends DfsServlet { + + public static final String PATH_SPEC = "/topology"; + + @Override + public void doGet(HttpServletRequest request, HttpServletResponse response) Review comment: I'm sorry, I don't quite understand what you mean. Could you please give me some specific suggestions, thank you very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581523) Time Spent: 1h (was: 50m) > Print network topology on the web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 1h > Remaining Estimate: 0h > > In order to query the network topology information conveniently, we can print > it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15970) Print network topology on the web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581522 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:47 Start Date: 13/Apr/21 03:47 Worklog Time Spent: 10m Work Description: tomscut commented on a change in pull request #2896: URL: https://github.com/apache/hadoop/pull/2896#discussion_r612106334 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java ## @@ -0,0 +1,115 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager; +import org.apache.hadoop.net.NetUtils; +import org.apache.hadoop.net.Node; +import org.apache.hadoop.net.NodeBase; +import org.apache.hadoop.util.StringUtils; + +import javax.servlet.ServletContext; +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; +import java.io.PrintStream; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.TreeSet; + +/** + * A servlet to print out the network topology. + */ +@InterfaceAudience.Private +public class NetworkTopologyServlet extends DfsServlet { + + public static final String PATH_SPEC = "/topology"; + + @Override + public void doGet(HttpServletRequest request, HttpServletResponse response) + throws IOException { +final ServletContext context = getServletContext(); +NameNode nn = NameNodeHttpServer.getNameNodeFromContext(context); +BlockManager bm = nn.getNamesystem().getBlockManager(); +List leaves = bm.getDatanodeManager().getNetworkTopology() +.getLeaves(NodeBase.ROOT); + +response.setContentType("text/plain; charset=UTF-8"); +try (PrintStream out = new PrintStream( +response.getOutputStream(), false, "UTF-8")) { + printTopology(out, leaves); +} catch (Throwable t) { + String errMsg = "Print network topology failed. " + + StringUtils.stringifyException(t); + response.sendError(HttpServletResponse.SC_GONE, errMsg); + throw new IOException(errMsg); +} finally { + response.getOutputStream().close(); +} + } + + /** + * Display each rack and the nodes assigned to that rack, as determined + * by the NameNode, in a hierarchical manner. The nodes and racks are + * sorted alphabetically. + * + * @param stream print stream + * @param leaves leaves nodes under base scope + */ + public void printTopology(PrintStream stream, List leaves) { +if (leaves.size() == 0) { + stream.print("No DataNodes"); + return; +} + +// Build a map of rack -> nodes from the datanode report +HashMap> tree = new HashMap>(); +for(Node dni : leaves) { + String location = dni.getNetworkLocation(); + String name = dni.getName(); + + if(!tree.containsKey(location)) { Review comment: Thanks @goiri for your careful review, I will fix these problems quickly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581522) Time Spent: 50m (was: 40m) > Print network topology on the web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 50m > Remaining Estimate: 0h > >
[jira] [Work logged] (HDFS-15621) Datanode DirectoryScanner uses excessive memory
[ https://issues.apache.org/jira/browse/HDFS-15621?focusedWorklogId=581520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581520 ] ASF GitHub Bot logged work on HDFS-15621: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:45 Start Date: 13/Apr/21 03:45 Worklog Time Spent: 10m Work Description: jojochuang commented on pull request #2849: URL: https://github.com/apache/hadoop/pull/2849#issuecomment-818409032 The spotbugs warning looks like a false positive to me. `Redundant nullcheck of file, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.compileReport(File, File, Collection, DirectoryScanner$ReportCompiler) Redundant null check at FsVolumeImpl.java:[line 1477]` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581520) Time Spent: 40m (was: 0.5h) > Datanode DirectoryScanner uses excessive memory > --- > > Key: HDFS-15621 > URL: https://issues.apache.org/jira/browse/HDFS-15621 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 3.4.0 >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot > 2020-10-09 at 15.20.56.png > > Time Spent: 40m > Remaining Estimate: 0h > > We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes > with a lot of blocks, this can mean a lot of heap. > We recently captured a heapdump of a DN with about 22M blocks and found only > about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken > by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to > strings. > Checking the strings in question, we can see two strings per scanInfo, > looking like: > {code} > /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785 > _106716708.meta > {code} > I will update a screen shot from MAT showing this. > For the first string especially, the part > "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will > be the same for every block in the block pool as the scanner is only > concerned about finalized blocks. > We can probably also store just the subdir indexes "28" and "27" rather than > "subdir28/subdir17" and then construct the path when it is requested via the > getter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319861#comment-17319861 ] Íñigo Goiri commented on HDFS-15423: I reverted the merge. [~fengnanli] do you mind creating a new PR fixing the compilation issue? I'm curious on why Yetus didn't catch this. > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581518 ] ASF GitHub Bot logged work on HDFS-15423: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:44 Start Date: 13/Apr/21 03:44 Worklog Time Spent: 10m Work Description: goiri opened a new pull request #2900: URL: https://github.com/apache/hadoop/pull/2900 Reverts apache/hadoop#2605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581518) Time Spent: 6h 20m (was: 6h 10m) > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581519 ] ASF GitHub Bot logged work on HDFS-15423: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:44 Start Date: 13/Apr/21 03:44 Worklog Time Spent: 10m Work Description: goiri merged pull request #2900: URL: https://github.com/apache/hadoop/pull/2900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581519) Time Spent: 6.5h (was: 6h 20m) > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6.5h > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15970) Print network topology on the web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581517 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:43 Start Date: 13/Apr/21 03:43 Worklog Time Spent: 10m Work Description: goiri commented on a change in pull request #2896: URL: https://github.com/apache/hadoop/pull/2896#discussion_r611908007 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java ## @@ -0,0 +1,115 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode; + +import org.apache.hadoop.classification.InterfaceAudience; +import org.apache.hadoop.hdfs.server.blockmanagement.BlockManager; +import org.apache.hadoop.net.NetUtils; +import org.apache.hadoop.net.Node; +import org.apache.hadoop.net.NodeBase; +import org.apache.hadoop.util.StringUtils; + +import javax.servlet.ServletContext; +import javax.servlet.http.HttpServletRequest; +import javax.servlet.http.HttpServletResponse; +import java.io.IOException; +import java.io.PrintStream; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.TreeSet; + +/** + * A servlet to print out the network topology. + */ +@InterfaceAudience.Private +public class NetworkTopologyServlet extends DfsServlet { + + public static final String PATH_SPEC = "/topology"; + + @Override + public void doGet(HttpServletRequest request, HttpServletResponse response) + throws IOException { +final ServletContext context = getServletContext(); +NameNode nn = NameNodeHttpServer.getNameNodeFromContext(context); +BlockManager bm = nn.getNamesystem().getBlockManager(); +List leaves = bm.getDatanodeManager().getNetworkTopology() +.getLeaves(NodeBase.ROOT); + +response.setContentType("text/plain; charset=UTF-8"); +try (PrintStream out = new PrintStream( +response.getOutputStream(), false, "UTF-8")) { + printTopology(out, leaves); +} catch (Throwable t) { + String errMsg = "Print network topology failed. " + + StringUtils.stringifyException(t); + response.sendError(HttpServletResponse.SC_GONE, errMsg); + throw new IOException(errMsg); +} finally { + response.getOutputStream().close(); +} + } + + /** + * Display each rack and the nodes assigned to that rack, as determined + * by the NameNode, in a hierarchical manner. The nodes and racks are + * sorted alphabetically. + * + * @param stream print stream + * @param leaves leaves nodes under base scope + */ + public void printTopology(PrintStream stream, List leaves) { +if (leaves.size() == 0) { + stream.print("No DataNodes"); + return; +} + +// Build a map of rack -> nodes from the datanode report +HashMap> tree = new HashMap>(); Review comment: Can we do: Map> tree = new HashMap>(); ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NetworkTopologyServlet.java ## @@ -0,0 +1,115 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.namenode; + +import
[jira] [Work logged] (HDFS-15815) if required storageType are unavailable, log the failed reason during choosing Datanode
[ https://issues.apache.org/jira/browse/HDFS-15815?focusedWorklogId=581512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581512 ] ASF GitHub Bot logged work on HDFS-15815: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:36 Start Date: 13/Apr/21 03:36 Worklog Time Spent: 10m Work Description: jojochuang commented on pull request #2882: URL: https://github.com/apache/hadoop/pull/2882#issuecomment-818406777 @ayushtkn fyi will merge later if no objections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581512) Time Spent: 40m (was: 0.5h) > if required storageType are unavailable, log the failed reason during > choosing Datanode > > > Key: HDFS-15815 > URL: https://issues.apache.org/jira/browse/HDFS-15815 > Project: Hadoop HDFS > Issue Type: Improvement > Components: block placement >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15815.001.patch, HDFS-15815.002.patch, > HDFS-15815.003.patch > > Time Spent: 40m > Remaining Estimate: 0h > > For better debug, if required storageType are unavailable, log the failed > reason "NO_REQUIRED_STORAGE_TYPE" when choosing Datanode. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15759: --- Fix Version/s: 3.1.5 > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.1.5, 3.2.3 > > Time Spent: 10h > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly
[ https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581505 ] ASF GitHub Bot logged work on HDFS-15714: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:11 Start Date: 13/Apr/21 03:11 Worklog Time Spent: 10m Work Description: PHILO-HE commented on pull request #2655: URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818398537 1) Yes, LevelDB based AliasMap is recommended to user. And text based AliasMap is just for the purpose of unit test. In this patch, we made few code changes for AliasMap. You may note that it was initially introduced by the community few years ago. 2) For namenode HA, we had not tested this feature on that. I think there are two main considerations. Firstly, mount operation should be recovered in new NN. Thus, the mounted remote storages are "visible" to new active NN. Since we log mount info in edit log for each mount request, this may be not a problem. Secondly, key info currently kept in memory should be available to other NNs, e.g., key tracking info in syncing data to remote storage to guarantee data consistency even though active NN is shifted. Frankly speaking, provided storage is still an experimental feature. So there may still exist a large gap for productization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581505) Time Spent: 2.5h (was: 2h 20m) > HDFS Provided Storage Read/Write Mount Support On-the-fly > - > > Key: HDFS-15714 > URL: https://issues.apache.org/jira/browse/HDFS-15714 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15714-01.patch, > HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf > > Time Spent: 2.5h > Remaining Estimate: 0h > > HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. > In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through > configuring external storage with PROVIDED tag for DataNode, user can enable > application to access data stored externally from HDFS side. However, there > are two issues need to be addressed. Firstly, mounting external storage > on-the-fly, namely dynamic mount, is lacking. It is necessary to get it > supported to flexibly combine HDFS with an external storage at runtime. > Secondly, PS write is not supported by current HDFS. But in real > applications, it is common to transfer data bi-directionally for read/write > between HDFS and external storage. > Through this JIRA, we are presenting our work for PS write support and > dynamic mount support for both read & write. Please note in the community > several JIRAs have been filed for these topics. Our work is based on these > previous community work, with new design & implementation to support called > writeBack mount and enable admin to add any mount on-the-fly. We appreciate > those folks in the community for their great contribution! See their pending > JIRAs: HDFS-14805 & HDFS-12090. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581504 ] ASF GitHub Bot logged work on HDFS-15759: - Author: ASF GitHub Bot Created on: 13/Apr/21 03:10 Start Date: 13/Apr/21 03:10 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #2868: URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818398223 HDFS-15940 has fixed TestBlockRecovery -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581504) Time Spent: 10h (was: 9h 50m) > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 10h > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581501 ] ASF GitHub Bot logged work on HDFS-15759: - Author: ASF GitHub Bot Created on: 13/Apr/21 02:46 Start Date: 13/Apr/21 02:46 Worklog Time Spent: 10m Work Description: jojochuang merged pull request #2868: URL: https://github.com/apache/hadoop/pull/2868 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581501) Time Spent: 9h 40m (was: 9.5h) > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581502 ] ASF GitHub Bot logged work on HDFS-15759: - Author: ASF GitHub Bot Created on: 13/Apr/21 02:46 Start Date: 13/Apr/21 02:46 Worklog Time Spent: 10m Work Description: jojochuang commented on pull request #2868: URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818391251 Thanks! @ferhui -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581502) Time Spent: 9h 50m (was: 9h 40m) > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 9h 50m > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-15759: --- Fix Version/s: 3.2.3 > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 3.2.3 > > Time Spent: 9h 40m > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319824#comment-17319824 ] Hui Fei commented on HDFS-15423: Hi compile fails on trunk. Maybe it's related to this. RouterWebHdfsMethods#chooseDatanode {code:java} resolvedNs = rpcServer.getCreateLocation(path).getNameserviceId(); {code} But getCreateLocation is defined as this, it is used with wrong arguments. {code:java} RemoteLocation getCreateLocation( final String src, final List locations) throws IOException { ... }{code} > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei reopened HDFS-15423: > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15759) EC: Verify EC reconstruction correctness on DataNode
[ https://issues.apache.org/jira/browse/HDFS-15759?focusedWorklogId=581492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581492 ] ASF GitHub Bot logged work on HDFS-15759: - Author: ASF GitHub Bot Created on: 13/Apr/21 02:01 Start Date: 13/Apr/21 02:01 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #2868: URL: https://github.com/apache/hadoop/pull/2868#issuecomment-818375830 @jojochuang Thanks. Failed tests are unrelated, They passed locally except TestBlockRecovery. And TestBlockRecovery fails without this PR, so i think it's not related to this PR. I will check it on trunk. +1 for this PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581492) Time Spent: 9.5h (was: 9h 20m) > EC: Verify EC reconstruction correctness on DataNode > > > Key: HDFS-15759 > URL: https://issues.apache.org/jira/browse/HDFS-15759 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, ec, erasure-coding >Affects Versions: 3.4.0 >Reporter: Toshihiko Uchida >Assignee: Toshihiko Uchida >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 9.5h > Remaining Estimate: 0h > > EC reconstruction on DataNode has caused data corruption: HDFS-14768, > HDFS-15186 and HDFS-15240. Those issues occur under specific conditions and > the corruption is neither detected nor auto-healed by HDFS. It is obviously > hard for users to monitor data integrity by themselves, and even if they find > corrupted data, it is difficult or sometimes impossible to recover them. > To prevent further data corruption issues, this feature proposes a simple and > effective way to verify EC reconstruction correctness on DataNode at each > reconstruction process. > It verifies correctness of outputs decoded from inputs as follows: > 1. Decoding an input with the outputs; > 2. Compare the decoded input with the original input. > For instance, in RS-6-3, assume that outputs [d1, p1] are decoded from inputs > [d0, d2, d3, d4, d5, p0]. Then the verification is done by decoding d0 from > [d1, d2, d3, d4, d5, p1], and comparing the original and decoded data of d0. > When an EC reconstruction task goes wrong, the comparison will fail with high > probability. > Then the task will also fail and be retried by NameNode. > The next reconstruction will succeed if the condition triggered the failure > is gone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly
[ https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581491 ] ASF GitHub Bot logged work on HDFS-15714: - Author: ASF GitHub Bot Created on: 13/Apr/21 01:58 Start Date: 13/Apr/21 01:58 Worklog Time Spent: 10m Work Description: Zhangshunyu edited a comment on pull request #2655: URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818374335 Currentlly, alias map is based on LevelDB, and it does not support namenode HA, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581491) Time Spent: 2h 20m (was: 2h 10m) > HDFS Provided Storage Read/Write Mount Support On-the-fly > - > > Key: HDFS-15714 > URL: https://issues.apache.org/jira/browse/HDFS-15714 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15714-01.patch, > HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf > > Time Spent: 2h 20m > Remaining Estimate: 0h > > HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. > In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through > configuring external storage with PROVIDED tag for DataNode, user can enable > application to access data stored externally from HDFS side. However, there > are two issues need to be addressed. Firstly, mounting external storage > on-the-fly, namely dynamic mount, is lacking. It is necessary to get it > supported to flexibly combine HDFS with an external storage at runtime. > Secondly, PS write is not supported by current HDFS. But in real > applications, it is common to transfer data bi-directionally for read/write > between HDFS and external storage. > Through this JIRA, we are presenting our work for PS write support and > dynamic mount support for both read & write. Please note in the community > several JIRAs have been filed for these topics. Our work is based on these > previous community work, with new design & implementation to support called > writeBack mount and enable admin to add any mount on-the-fly. We appreciate > those folks in the community for their great contribution! See their pending > JIRAs: HDFS-14805 & HDFS-12090. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15714) HDFS Provided Storage Read/Write Mount Support On-the-fly
[ https://issues.apache.org/jira/browse/HDFS-15714?focusedWorklogId=581490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581490 ] ASF GitHub Bot logged work on HDFS-15714: - Author: ASF GitHub Bot Created on: 13/Apr/21 01:57 Start Date: 13/Apr/21 01:57 Worklog Time Spent: 10m Work Description: Zhangshunyu commented on pull request #2655: URL: https://github.com/apache/hadoop/pull/2655#issuecomment-818374335 Currentlly, alias map is based on LevelDB, and it is not support namenode HA, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581490) Time Spent: 2h 10m (was: 2h) > HDFS Provided Storage Read/Write Mount Support On-the-fly > - > > Key: HDFS-15714 > URL: https://issues.apache.org/jira/browse/HDFS-15714 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode >Affects Versions: 3.4.0 >Reporter: Feilong He >Assignee: Feilong He >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15714-01.patch, > HDFS_Provided_Storage_Design-V1.pdf, HDFS_Provided_Storage_Performance-V1.pdf > > Time Spent: 2h 10m > Remaining Estimate: 0h > > HDFS Provided Storage (PS) is a feature to tier HDFS over other file systems. > In HDFS-9806, PROVIDED storage type was introduced to HDFS. Through > configuring external storage with PROVIDED tag for DataNode, user can enable > application to access data stored externally from HDFS side. However, there > are two issues need to be addressed. Firstly, mounting external storage > on-the-fly, namely dynamic mount, is lacking. It is necessary to get it > supported to flexibly combine HDFS with an external storage at runtime. > Secondly, PS write is not supported by current HDFS. But in real > applications, it is common to transfer data bi-directionally for read/write > between HDFS and external storage. > Through this JIRA, we are presenting our work for PS write support and > dynamic mount support for both read & write. Please note in the community > several JIRAs have been filed for these topics. Our work is based on these > previous community work, with new design & implementation to support called > writeBack mount and enable admin to add any mount on-the-fly. We appreciate > those folks in the community for their great contribution! See their pending > JIRAs: HDFS-14805 & HDFS-12090. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15970) Print network topology on the web
[ https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15970: --- Summary: Print network topology on the web (was: Print network topology on web) > Print network topology on the web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 0.5h > Remaining Estimate: 0h > > In order to query the network topology information conveniently, we can print > it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319724#comment-17319724 ] Fengnan Li commented on HDFS-15423: --- Thanks [~elgoiri] [~ayushtkn] for the review! Let's see whether it can fix [HDFS-15878|https://issues.apache.org/jira/browse/HDFS-15878] > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6
[ https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319704#comment-17319704 ] Ayush Saxena commented on HDFS-15660: - {quote}As version 3.1, 3.2 and 3.3 already contain the new storage type, it should be okay to do the upgrade. So I don't cherry-pick to other branches. {quote} HDFS-15025 added a new storage type and that is in 3.4.0, so IMO we should cherry-pick this to 3.x branches? Else the problem what was faced for PROVIDED storage type will happen for this also and in case some new storage type is added in future for them also. > StorageTypeProto is not compatiable between 3.x and 2.6 > --- > > Key: HDFS-15660 > URL: https://issues.apache.org/jira/browse/HDFS-15660 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1 >Reporter: Ryan Wu >Assignee: Ryan Wu >Priority: Major > Fix For: 2.9.3, 3.4.0, 2.10.2 > > Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch > > > In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6, > we found hive to call getContentSummary method , the client and server was > not compatible because of hadoop3 added new PROVIDED storage type. > {code:java} > // code placeholder > 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while > invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over > x/x:8020. Trying to fail over immediately. > java.io.IOException: com.google.protobuf.ServiceException: > com.google.protobuf.UninitializedMessageException: Message missing required > fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713) > at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) > at > org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) > at > org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118) > at org.apache.hadoop.fs.shell.Command.run(Command.java:165) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:372) > Caused by: com.google.protobuf.ServiceException: > com.google.protobuf.UninitializedMessageException: Message missing required > fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272) > at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816) > ... 23 more > Caused by: com.google.protobuf.UninitializedMessageException: Message missing > required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) > at >
[jira] [Commented] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319676#comment-17319676 ] Íñigo Goiri commented on HDFS-15423: Thanks [~fengnanli] for the improvement. Merged PR 2605. > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15660) StorageTypeProto is not compatiable between 3.x and 2.6
[ https://issues.apache.org/jira/browse/HDFS-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319677#comment-17319677 ] Jim Brennan commented on HDFS-15660: I share [~weichiu]'s confusion. If this change was put in trunk and all branch-2 branches, I don't understand why we would skip branch-3.1, branch-3.2, and branch-3.3? It may not be strictly needed, but shouldn't we keep the change consistent across branches? > StorageTypeProto is not compatiable between 3.x and 2.6 > --- > > Key: HDFS-15660 > URL: https://issues.apache.org/jira/browse/HDFS-15660 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.0.0, 3.0.1, 2.9.2, 2.8.5, 2.7.7, 2.10.1 >Reporter: Ryan Wu >Assignee: Ryan Wu >Priority: Major > Fix For: 2.9.3, 3.4.0, 2.10.2 > > Attachments: HDFS-15660.002.patch, HDFS-15660.003.patch > > > In our case, when nn has upgraded to 3.1.3 and dn’s version was still 2.6, > we found hive to call getContentSummary method , the client and server was > not compatible because of hadoop3 added new PROVIDED storage type. > {code:java} > // code placeholder > 20/04/15 14:28:35 INFO retry.RetryInvocationHandler---main: Exception while > invoking getContentSummary of class ClientNamenodeProtocolTranslatorPB over > x/x:8020. Trying to fail over immediately. > java.io.IOException: com.google.protobuf.ServiceException: > com.google.protobuf.UninitializedMessageException: Message missing required > fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > org.apache.hadoop.ipc.ProtobufHelper.getRemoteException(ProtobufHelper.java:47) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:819) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:258) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy11.getContentSummary(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getContentSummary(DFSClient.java:3144) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:706) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:702) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getContentSummary(DistributedFileSystem.java:713) > at org.apache.hadoop.fs.shell.Count.processPath(Count.java:109) > at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:317) > at > org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:289) > at > org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271) > at > org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255) > at > org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118) > at org.apache.hadoop.fs.shell.Command.run(Command.java:165) > at org.apache.hadoop.fs.FsShell.run(FsShell.java:315) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.fs.FsShell.main(FsShell.java:372) > Caused by: com.google.protobuf.ServiceException: > com.google.protobuf.UninitializedMessageException: Message missing required > fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:272) > at com.sun.proxy.$Proxy10.getContentSummary(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getContentSummary(ClientNamenodeProtocolTranslatorPB.java:816) > ... 23 more > Caused by: com.google.protobuf.UninitializedMessageException: Message missing > required fields: summary.typeQuotaInfos.typeQuotaInfo[3].type > at > com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetContentSummaryResponseProto$Builder.build(ClientNamenodeProtocolProtos.java:65392) > at >
[jira] [Work logged] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?focusedWorklogId=581288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581288 ] ASF GitHub Bot logged work on HDFS-15423: - Author: ASF GitHub Bot Created on: 12/Apr/21 19:42 Start Date: 12/Apr/21 19:42 Worklog Time Spent: 10m Work Description: goiri merged pull request #2605: URL: https://github.com/apache/hadoop/pull/2605 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581288) Time Spent: 6h 10m (was: 6h) > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h 10m > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15423) RBF: WebHDFS create shouldn't choose DN from all sub-clusters
[ https://issues.apache.org/jira/browse/HDFS-15423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved HDFS-15423. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > RBF: WebHDFS create shouldn't choose DN from all sub-clusters > - > > Key: HDFS-15423 > URL: https://issues.apache.org/jira/browse/HDFS-15423 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf, webhdfs >Reporter: Chao Sun >Assignee: Fengnan Li >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 6h > Remaining Estimate: 0h > > In {{RouterWebHdfsMethods}} and for a {{CREATE}} call, {{chooseDatanode}} > first gets all DNs via {{getDatanodeReport}}, and then randomly pick one from > the list via {{getRandomDatanode}}. This logic doesn't seem correct as it > should pick a DN for the specific cluster(s) of the input {{path}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?focusedWorklogId=581287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581287 ] ASF GitHub Bot logged work on HDFS-15971: - Author: ASF GitHub Bot Created on: 12/Apr/21 19:42 Start Date: 12/Apr/21 19:42 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2898: URL: https://github.com/apache/hadoop/pull/2898#issuecomment-818083196 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 51s | | trunk passed | | +1 :green_heart: | compile | 2m 44s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 2m 44s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | mvnsite | 0m 29s | | trunk passed | | +1 :green_heart: | shadedclient | 53m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 17s | | the patch passed | | +1 :green_heart: | compile | 2m 32s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | cc | 2m 32s | | the patch passed | | +1 :green_heart: | golang | 2m 32s | | the patch passed | | +1 :green_heart: | javac | 2m 32s | | the patch passed | | +1 :green_heart: | compile | 2m 34s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | cc | 2m 34s | | the patch passed | | +1 :green_heart: | golang | 2m 34s | | the patch passed | | +1 :green_heart: | javac | 2m 34s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 0m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 13m 13s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 31m 48s | | hadoop-hdfs-native-client in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 107m 33s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2898 | | Optional Tests | dupname asflicense compile cc mvnsite javac unit codespell golang | | uname | Linux 939cc4c9ed63 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2b1da158404546225a694691400c5271d4f631ac | | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/testReport/ | | Max. process+thread count | 713 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2898/1/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581287) Time Spent: 20m (was: 10m) > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL:
[jira] [Work logged] (HDFS-15970) Print network topology on web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581232 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 12/Apr/21 18:31 Start Date: 12/Apr/21 18:31 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2896: URL: https://github.com/apache/hadoop/pull/2896#issuecomment-818033898 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 54s | | trunk passed | | +1 :green_heart: | compile | 1m 20s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 12s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 0s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 21s | | trunk passed | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 27s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 5s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 17s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 12s | | the patch passed | | +1 :green_heart: | compile | 1m 6s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 1m 6s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 53s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 5 unchanged - 1 fixed = 6 total (was 6) | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | javadoc | 0m 44s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 15s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 10s | | the patch passed | | +1 :green_heart: | shadedclient | 15m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 233m 16s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 319m 39s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots | | | hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys | | | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2896/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2896 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 3e68afa520ab 4.15.0-136-generic #140-Ubuntu SMP Thu Jan 28 05:20:47 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality |
[jira] [Work logged] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?focusedWorklogId=581199=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581199 ] ASF GitHub Bot logged work on HDFS-15971: - Author: ASF GitHub Bot Created on: 12/Apr/21 17:53 Start Date: 12/Apr/21 17:53 Worklog Time Spent: 10m Work Description: GauthamBanasandra opened a new pull request #2898: URL: https://github.com/apache/hadoop/pull/2898 * mkstemp isn't available in Visual C++. This PR implements the necessary cross platform implementation for mkstemp. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581199) Remaining Estimate: 0h Time Spent: 10m > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15971) Make mkstemp cross platform
[ https://issues.apache.org/jira/browse/HDFS-15971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15971: -- Labels: pull-request-available (was: ) > Make mkstemp cross platform > --- > > Key: HDFS-15971 > URL: https://issues.apache.org/jira/browse/HDFS-15971 > Project: Hadoop HDFS > Issue Type: Improvement > Components: libhdfs++ >Affects Versions: 3.4.0 >Reporter: Gautham Banasandra >Assignee: Gautham Banasandra >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15971) Make mkstemp cross platform
Gautham Banasandra created HDFS-15971: - Summary: Make mkstemp cross platform Key: HDFS-15971 URL: https://issues.apache.org/jira/browse/HDFS-15971 Project: Hadoop HDFS Issue Type: Improvement Components: libhdfs++ Affects Versions: 3.4.0 Reporter: Gautham Banasandra Assignee: Gautham Banasandra mkstemp isn't available in Visual C++. Need to make it cross platform. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes
[ https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=581140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581140 ] ASF GitHub Bot logged work on HDFS-15785: - Author: ASF GitHub Bot Created on: 12/Apr/21 16:50 Start Date: 12/Apr/21 16:50 Worklog Time Spent: 10m Work Description: fengnanli commented on a change in pull request #2639: URL: https://github.com/apache/hadoop/pull/2639#discussion_r574117112 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java ## @@ -647,6 +634,58 @@ public static String addKeySuffixes(String key, String... suffixes) { getNNLifelineRpcAddressesForCluster(Configuration conf) throws IOException { +Collection parentNameServices = getParentNameServices(conf); + +return getAddressesForNsIds(conf, parentNameServices, null, +DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY); + } + + // + /** + * Returns the configured address for all NameNodes in the cluster. + * This is similar with DFSUtilClient.getAddressesForNsIds() + * but can access DFSConfigKeys. + * + * @param conf configuration + * @param defaultAddress default address to return in case key is not found. + * @param keys Set of keys to look for in the order of preference + * + * @return a map(nameserviceId to map(namenodeId to InetSocketAddress)) + */ + static Map> getAddressesForNsIds( Review comment: Can we try this to reduce the code duplicity? Override function `DFSUtilClient.getAddressesForNsIds()` by adding a boolean var indicating whether to resolve (the var is fetched from the config). Inside the `DFSUtilClient.getAddressesForNameserviceId`, add another override with the boolean, make the current one with the value false. If the var is true, do the DNS resolving and return addresses. ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java ## @@ -1557,6 +1557,17 @@ public static final double DFS_DATANODE_RESERVE_FOR_ARCHIVE_DEFAULT_PERCENTAGE_DEFAULT = 0.0; + + public static final String + DFS_NAMESERVICES_RESOLUTION_ENABLED = Review comment: If we maintain only one config across nn, qjm, zkfc and dn, this is an issue since the other three don't support DNS yet. I am thinking about how to do it for now and it requires some refactor in places such as `DFSUtil.getSuffixIDs` (used by zkfc). I will follow up on this soon. ATM we can keep a separate config for DN only as a short term solution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581140) Time Spent: 2.5h (was: 2h 20m) > Datanode to support using DNS to resolve nameservices to IP addresses to get > list of namenodes > -- > > Key: HDFS-15785 > URL: https://issues.apache.org/jira/browse/HDFS-15785 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently as HDFS supports observers, multiple-standby and router, the > namenode hosts are changing frequently in large deployment, we can consider > supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to > reduce the need to update config frequently on all datanodes. In that case, > datanode and clients can use the same set of config as well. > Basically we can resolve the DNS and generate namenode for each IP behind it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15878) Flaky test TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in Trunk
[ https://issues.apache.org/jira/browse/HDFS-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319573#comment-17319573 ] Fengnan Li commented on HDFS-15878: --- Let's wait after HDFS-15423 is committed. Thanks. > Flaky test > TestRouterWebHDFSContractCreate>AbstractContractCreateTest#testSyncable in > Trunk > --- > > Key: HDFS-15878 > URL: https://issues.apache.org/jira/browse/HDFS-15878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, rbf >Reporter: Renukaprasad C >Assignee: Fengnan Li >Priority: Major > > ERROR] Tests run: 16, Failures: 0, Errors: 1, Skipped: 2, Time elapsed: > 24.627 s <<< FAILURE! - in > org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate > [ERROR] > testSyncable(org.apache.hadoop.fs.contract.router.web.TestRouterWebHDFSContractCreate) > Time elapsed: 0.222 s <<< ERROR! > java.io.FileNotFoundException: File /test/testSyncable not found. > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:110) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:576) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$900(WebHdfsFileSystem.java:146) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:892) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:858) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:652) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:690) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:686) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.getRedirectedUrl(WebHdfsFileSystem.java:2307) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.(WebHdfsFileSystem.java:2296) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem$WebHdfsInputStream.(WebHdfsFileSystem.java:2176) > at > org.apache.hadoop.hdfs.web.WebHdfsFileSystem.open(WebHdfsFileSystem.java:1610) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:975) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.validateSyncableSemantics(AbstractContractCreateTest.java:556) > at > org.apache.hadoop.fs.contract.AbstractContractCreateTest.testSyncable(AbstractContractCreateTest.java:459) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: > org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File > /test/testSyncable not found. > at >
[jira] [Work logged] (HDFS-15970) Print network topology on web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=581108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581108 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 12/Apr/21 16:00 Start Date: 12/Apr/21 16:00 Worklog Time Spent: 10m Work Description: ayushtkn commented on pull request #2896: URL: https://github.com/apache/hadoop/pull/2896#issuecomment-817930124 Can you extend this to the rbf ui as well? the federationhealth.html and federationhealth.js -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 581108) Time Spent: 20m (was: 10m) > Print network topology on web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 20m > Remaining Estimate: 0h > > In order to query the network topology information conveniently, we can print > it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15956) Provide utility class for FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-15956: Labels: (was: pull-request-available) > Provide utility class for FSNamesystem > -- > > Key: HDFS-15956 > URL: https://issues.apache.org/jira/browse/HDFS-15956 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > With ever-growing functionalities, FSNamesystem has become very huge (with > ~9k lines of code) over a period of time, we should provide a utility class > and refactor as many basic utility functions to new class as we can. > With any further suggestions, we can create sub-tasks of this Jira and work > on them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HDFS-15965) Please upgrade the log4j dependency to log4j2
[ https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani updated HDFS-15965: Comment: was deleted (was: This is already being discussed on HADOOP-16206) > Please upgrade the log4j dependency to log4j2 > - > > Key: HDFS-15965 > URL: https://issues.apache.org/jira/browse/HDFS-15965 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Affects Versions: 3.3.0, 3.2.1, 3.2.2 >Reporter: helen huang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > The log4j dependency being use by hadoop-common is currently version 1.2.17. > Our fortify scan picked up a couple of issue with this dependency. Please > update it to the latest version of log4j2 dependencies: > > org.apache.logging.log4j > log4j-api > 2.14.1 > > > org.apache.logging.log4j > log4j-core > 2.14.1 > > > The slf4j dependency will need to be updated as well after you upgrade log4j > to log4j2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15970) Print network topology on web
[ https://issues.apache.org/jira/browse/HDFS-15970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15970: -- Labels: pull-request-available (was: ) > Print network topology on web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Labels: pull-request-available > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 10m > Remaining Estimate: 0h > > In order to query the network topology information conveniently, we can print > it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15970) Print network topology on web
[ https://issues.apache.org/jira/browse/HDFS-15970?focusedWorklogId=580974=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580974 ] ASF GitHub Bot logged work on HDFS-15970: - Author: ASF GitHub Bot Created on: 12/Apr/21 13:11 Start Date: 12/Apr/21 13:11 Worklog Time Spent: 10m Work Description: tomscut opened a new pull request #2896: URL: https://github.com/apache/hadoop/pull/2896 JIRA: [HDFS-15970](https://issues.apache.org/jira/browse/HDFS-15970) In order to query the network topology information conveniently, we can print it on the web. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580974) Remaining Estimate: 0h Time Spent: 10m > Print network topology on web > - > > Key: HDFS-15970 > URL: https://issues.apache.org/jira/browse/HDFS-15970 > Project: Hadoop HDFS > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Attachments: hdfs-topology.jpg, hdfs-web.jpg > > Time Spent: 10m > Remaining Estimate: 0h > > In order to query the network topology information conveniently, we can print > it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15970) Print network topology on web
tomscut created HDFS-15970: -- Summary: Print network topology on web Key: HDFS-15970 URL: https://issues.apache.org/jira/browse/HDFS-15970 Project: Hadoop HDFS Issue Type: Wish Reporter: tomscut Assignee: tomscut Attachments: hdfs-topology.jpg, hdfs-web.jpg In order to query the network topology information conveniently, we can print it on the web. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15965) Please upgrade the log4j dependency to log4j2
[ https://issues.apache.org/jira/browse/HDFS-15965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319447#comment-17319447 ] Viraj Jasani commented on HDFS-15965: - This is already being discussed on HADOOP-16206 > Please upgrade the log4j dependency to log4j2 > - > > Key: HDFS-15965 > URL: https://issues.apache.org/jira/browse/HDFS-15965 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsclient >Affects Versions: 3.3.0, 3.2.1, 3.2.2 >Reporter: helen huang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > The log4j dependency being use by hadoop-common is currently version 1.2.17. > Our fortify scan picked up a couple of issue with this dependency. Please > update it to the latest version of log4j2 dependencies: > > org.apache.logging.log4j > log4j-api > 2.14.1 > > > org.apache.logging.log4j > log4j-core > 2.14.1 > > > The slf4j dependency will need to be updated as well after you upgrade log4j > to log4j2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15969) DFSClient prints token information a string format
[ https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319416#comment-17319416 ] Hadoop QA commented on HDFS-15969: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 31m 42s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} No case conflicting files found. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 24m 11s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green}{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 17m 37s{color} | {color:green}{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green}{color} | {color:green} trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 21m 49s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are enabled, using SpotBugs. {color} | | {color:green}+1{color} | {color:green} spotbugs {color} | {color:green} 2m 49s{color} | {color:green}{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 1s{color} | {color:green}{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 56s{color} | {color:green}{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green}{color} | {color:green} the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} | |
[jira] [Commented] (HDFS-15175) Multiple CloseOp shared block instance causes the standby namenode to crash when rolling editlog
[ https://issues.apache.org/jira/browse/HDFS-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319413#comment-17319413 ] tomscut commented on HDFS-15175: Hi [~max2049],thanks for your work. The test case you provide can reflect this problem. But can we reproduce the issue by calling the relevant APIs (create/close/truncate)? > Multiple CloseOp shared block instance causes the standby namenode to crash > when rolling editlog > > > Key: HDFS-15175 > URL: https://issues.apache.org/jira/browse/HDFS-15175 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.9.2 >Reporter: Yicong Cai >Assignee: Wan Chang >Priority: Critical > Labels: NameNode > Attachments: HDFS-15175-trunk.1.patch > > > > {panel:title=Crash exception} > 2020-02-16 09:24:46,426 [507844305] - ERROR [Edit log > tailer:FSEditLogLoader@245] - Encountered exception on operation CloseOp > [length=0, inodeId=0, path=..., replication=3, mtime=1581816138774, > atime=1581814760398, blockSize=536870912, blocks=[blk_5568434562_4495417845], > permissions=da_music:hdfs:rw-r-, aclEntries=null, clientName=, > clientMachine=, overwrite=false, storagePolicyId=0, opCode=OP_CLOSE, > txid=32625024993] > java.io.IOException: File is not under construction: .. > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:442) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:237) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:146) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:891) > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:872) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:262) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:395) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$300(EditLogTailer.java:348) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:365) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1873) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:479) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:361) > {panel} > > {panel:title=Editlog} > > OP_REASSIGN_LEASE > > 32625021150 > DFSClient_NONMAPREDUCE_-969060727_197760 > .. > DFSClient_NONMAPREDUCE_1000868229_201260 > > > .. > > OP_CLOSE > > 32625023743 > 0 > 0 > .. > 3 > 1581816135883 > 1581814760398 > 536870912 > > > false > > 5568434562 > 185818644 > 4495417845 > > > da_music > hdfs > 416 > > > > .. > > OP_TRUNCATE > > 32625024049 > .. > DFSClient_NONMAPREDUCE_1000868229_201260 > .. > 185818644 > 1581816136336 > > 5568434562 > 185818648 > 4495417845 > > > > .. > > OP_CLOSE > > 32625024993 > 0 > 0 > .. > 3 > 1581816138774 > 1581814760398 > 536870912 > > > false > > 5568434562 > 185818644 > 4495417845 > > > da_music > hdfs > 416 > > > > {panel} > > > The block size should be 185818648 in the first CloseOp. When truncate is > used, the block size becomes 185818644. The CloseOp/TruncateOp/CloseOp is > synchronized to the JournalNode in the same batch. The block used by CloseOp > twice is the same instance, which causes the first CloseOp has wrong block > size. When SNN rolling Editlog, TruncateOp does not make the file to the > UnderConstruction state. Then, when the second CloseOp is executed, the file > is not in the UnderConstruction state, and SNN crashes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1
[ https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-15964: Fix Version/s: (was: 3.4.0) (was: 3.3.0) > Please update the okhttp version to 4.9.1 > - > > Key: HDFS-15964 > URL: https://issues.apache.org/jira/browse/HDFS-15964 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, dfsclient, security >Affects Versions: 3.3.0 >Reporter: helen huang >Priority: Major > > Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan > flagged two issues with this version. Please update it to the latest (It is > okhttp3 4.9.1 at this point). Thanks! > > com.squareup.okhttp3 > okhttp > 4.9.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15964) Please update the okhttp version to 4.9.1
[ https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319394#comment-17319394 ] Steve Loughran commented on HDFS-15964: --- changes like this should be submitted as github PRs. As this changes hdfs too, to ensure yetus does the hdfs build/test the PR needs to make some (any) change in the HDFS module. Adding a newline to the hdfs pom should be enough -we won't merge that. Be aware: changing dependencies are some of the most traumatic changes we can make. A single "change a line in a maven build" can break tests, cause downstream incompatibilities, trigger regressions in deployments which don't surface in unit tests etc etc. There is never a *just* update a JAR. It's "update the JAR, see what breaks, come up with a plan/timetable to fix". This one should be low risk. But things related to: guava, jackson, log4j are project-spanning minefields. T Further reading http://steveloughran.blogspot.com/2016/05/fear-of-dependencies.html > Please update the okhttp version to 4.9.1 > - > > Key: HDFS-15964 > URL: https://issues.apache.org/jira/browse/HDFS-15964 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, dfsclient, security >Affects Versions: 3.3.0 >Reporter: helen huang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan > flagged two issues with this version. Please update it to the latest (It is > okhttp3 4.9.1 at this point). Thanks! > > com.squareup.okhttp3 > okhttp > 4.9.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15964) Please update the okhttp version to 4.9.1
[ https://issues.apache.org/jira/browse/HDFS-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-15964: -- Component/s: security build > Please update the okhttp version to 4.9.1 > - > > Key: HDFS-15964 > URL: https://issues.apache.org/jira/browse/HDFS-15964 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, dfsclient, security >Affects Versions: 3.3.0 >Reporter: helen huang >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > Currently the okhttp used by the hdfs client is 2.7.5. Our fortify scan > flagged two issues with this version. Please update it to the latest (It is > okhttp3 4.9.1 at this point). Thanks! > > com.squareup.okhttp3 > okhttp > 4.9.1 > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15969) DFSClient prints token information a string format
[ https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15969: Status: Patch Available (was: Open) > DFSClient prints token information a string format > --- > > Key: HDFS-15969 > URL: https://issues.apache.org/jira/browse/HDFS-15969 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15969.001.patch > > > DFSclient prints token information in a string format, as this is sensitive > information it must be moved to debug level or can be exempted even from > debug level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15969) DFSClient prints token information a string format
[ https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15969: Attachment: HDFS-15969.001.patch > DFSClient prints token information a string format > --- > > Key: HDFS-15969 > URL: https://issues.apache.org/jira/browse/HDFS-15969 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15969.001.patch > > > DFSclient prints token information in a string format, as this is sensitive > information it must be moved to debug level or can be exempted even from > debug level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15967) Improve the log for Short Circuit Local Reads
[ https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15967: Status: Patch Available (was: In Progress) > Improve the log for Short Circuit Local Reads > - > > Key: HDFS-15967 > URL: https://issues.apache.org/jira/browse/HDFS-15967 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15967.001.patch > > > Improve the log for Short Circuit Local Reads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15968) Improve the log for The DecayRpcScheduler
[ https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15968: Status: Patch Available (was: In Progress) > Improve the log for The DecayRpcScheduler > -- > > Key: HDFS-15968 > URL: https://issues.apache.org/jira/browse/HDFS-15968 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15968.001.patch > > > Improve the log for The DecayRpcScheduler to make use of the SELF4j logger > factory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15968) Improve the log for The DecayRpcScheduler
[ https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15968: Attachment: HDFS-15968.001.patch > Improve the log for The DecayRpcScheduler > -- > > Key: HDFS-15968 > URL: https://issues.apache.org/jira/browse/HDFS-15968 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15968.001.patch > > > Improve the log for The DecayRpcScheduler to make use of the SELF4j logger > factory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15967) Improve the log for Short Circuit Local Reads
[ https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bhavik Patel updated HDFS-15967: Attachment: HDFS-15967.001.patch > Improve the log for Short Circuit Local Reads > - > > Key: HDFS-15967 > URL: https://issues.apache.org/jira/browse/HDFS-15967 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Attachments: HDFS-15967.001.patch > > > Improve the log for Short Circuit Local Reads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15969) DFSClient prints token information a string format
[ https://issues.apache.org/jira/browse/HDFS-15969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319333#comment-17319333 ] Ayush Saxena commented on HDFS-15969: - there are a bunch of jiras removing this itself from debug, so you can chunk this of > DFSClient prints token information a string format > --- > > Key: HDFS-15969 > URL: https://issues.apache.org/jira/browse/HDFS-15969 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > > DFSclient prints token information in a string format, as this is sensitive > information it must be moved to debug level or can be exempted even from > debug level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15969) DFSClient prints token information a string format
Bhavik Patel created HDFS-15969: --- Summary: DFSClient prints token information a string format Key: HDFS-15969 URL: https://issues.apache.org/jira/browse/HDFS-15969 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bhavik Patel Assignee: Bhavik Patel DFSclient prints token information in a string format, as this is sensitive information it must be moved to debug level or can be exempted even from debug level -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319274#comment-17319274 ] Ayush Saxena commented on HDFS-15614: - Thanx [~shashikant] for the responses, I will bother you a bit more: {quote}If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created implicitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {quote} I think I cleared this up in my question itself, So here is a test for that: {code:java} @Test public void testClientAmbiguity() throws Exception { Configuration conf = new HdfsConfiguration(); // Enable the feature conf.setBoolean("dfs.namenode.snapshot.trashroot.enabled", true); try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build()) { cluster.waitActive(); final DistributedFileSystem dfs = cluster.getFileSystem(); // Create two directories, on 1 allowSnapshot through DFS & on other // from HDFS ADMIN Path dir1 = new Path("/dir1"); Path dir2 = new Path("/dir2"); dfs.mkdirs(dir1); dfs.mkdirs(dir2); // AllowSnapshot on dir1 through dfs dfs.allowSnapshot(dir1); // AllowSnapshot on dir2 through dfsadmin DFSAdmin dfsAdmin = new DFSAdmin(conf); ToolRunner.run(conf, dfsAdmin,new String[]{"-allowSnapshot", dir2.toString()}); // Check for trash directory in dir1(allowed through dfs) assertFalse(dfs.exists(new Path(dir1,FileSystem.TRASH_PREFIX))); // (1) // Check for trash directory in dir2(allowed through DfsAdmin) assertTrue(dfs.exists(new Path(dir2,FileSystem.TRASH_PREFIX))); // Failover/Restart namenode and stuff cluster.restartNameNodes(); cluster.waitActive(); // Nothing should change // Check for trash directory in dir1(allowed through dfs) // Will fail here. stuff changed post restart of namenode. Such thing // will happen with uupgrade as well. assertFalse(dfs.exists(new Path(dir1,FileSystem.TRASH_PREFIX))); // (1) assertTrue(dfs.exists(new Path(dir2,FileSystem.TRASH_PREFIX))); } } {code} And this fails, And yep there is an ambiguity. {quote}This is important for provisioning snapshot trash to use ordered snapshot deletion feature if the system already had pre existing snapshottable directories. {quote} How come a client side feature that important, that can make the cluster go down in times of critical situation like failover, Again a test to show that: {code:java} @Test public void testFailureAfterFailoverOrRestart() throws Exception { Configuration conf = new HdfsConfiguration(); // Enable the feature conf.setBoolean("dfs.namenode.snapshot.trashroot.enabled", true); try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build()) { cluster.waitActive(); final DistributedFileSystem dfs = cluster.getFileSystem(); // Create a directory Path dir1 = new Path("/dir1"); dfs.mkdirs(dir1); // AllowSnapshot on dir1 dfs.allowSnapshot(dir1); // SetQuota dfs.setQuota(dir1, 1, 1); // Check if the cluster is working and happy. dfs.mkdirs(new Path("/dir2")); assertTrue(dfs.exists(new Path("/dir2"))); // Failover/Restart namenode or such stuff cluster.restartNameNodes(); // Namenode Crashed, It was failover, then // standby would also crash & ultimately whole of cluster. // Will not reach here itself. :-( cluster.waitActive(); dfs.listStatus(new Path("/dir1")); } } {code} {quote}The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {quote} Ok If you say, getAllSnapshottableDirs() might not be heavy, even if there are tons of snapshottable directories, So, getFileInfo for all these directories and then mkdirs for all these in worst case. So a normal scenario is like: 1 call getAllSnapshottableDirs -> say fetches 2 million dirs 2 million getFileInfo() calls -> say avg case 1 million doesn't have trash 1 million mkdirs() -> a write call isn't considered cheap & fast you go to JNs and stuff. If you get this creation of snapshots in the filesystem spec as well, still you won't get rid of any of these problems, Nevertheless, what ever be the case a normal running cluster shouldn't crash due to any feature, and that too during failovers, that is some crazy stuff. And regarding encryption zone stuff are you talking it is similar to HDFS-10324(I see this only linked on HDFS-15607)? Well I don't think it is doing create like stuff during startup. Will see if [~weichiu] can confirm that, He worked on it. Didn't dig in much though Well not very sure of the use case and things here, so would leave it to you guys. Please don't hold anything for me in the
[jira] [Work started] (HDFS-15968) Improve the log for The DecayRpcScheduler
[ https://issues.apache.org/jira/browse/HDFS-15968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15968 started by Bhavik Patel. --- > Improve the log for The DecayRpcScheduler > -- > > Key: HDFS-15968 > URL: https://issues.apache.org/jira/browse/HDFS-15968 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > > Improve the log for The DecayRpcScheduler to make use of the SELF4j logger > factory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15968) Improve the log for The DecayRpcScheduler
Bhavik Patel created HDFS-15968: --- Summary: Improve the log for The DecayRpcScheduler Key: HDFS-15968 URL: https://issues.apache.org/jira/browse/HDFS-15968 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bhavik Patel Assignee: Bhavik Patel Improve the log for The DecayRpcScheduler to make use of the SELF4j logger factory -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15967) Improve the log for Short Circuit Local Reads
Bhavik Patel created HDFS-15967: --- Summary: Improve the log for Short Circuit Local Reads Key: HDFS-15967 URL: https://issues.apache.org/jira/browse/HDFS-15967 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bhavik Patel Assignee: Bhavik Patel Improve the log for Short Circuit Local Reads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work started] (HDFS-15967) Improve the log for Short Circuit Local Reads
[ https://issues.apache.org/jira/browse/HDFS-15967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-15967 started by Bhavik Patel. --- > Improve the log for Short Circuit Local Reads > - > > Key: HDFS-15967 > URL: https://issues.apache.org/jira/browse/HDFS-15967 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > > Improve the log for Short Circuit Local Reads -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhanghuazong resolved HDFS-15966. - Resolution: Fixed > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Assignee: zhanghuazong >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhanghuazong reassigned HDFS-15966: --- Assignee: zhanghuazong > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Assignee: zhanghuazong >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580847 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 09:27 Start Date: 12/Apr/21 09:27 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#issuecomment-817648020 Failed junit tests have nothing to do with this PR. I ran them locally and all of them passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580847) Time Spent: 2h 20m (was: 2h 10m) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 2h 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15966: -- Labels: pull-request-available (was: ) > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?focusedWorklogId=580846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580846 ] ASF GitHub Bot logged work on HDFS-15966: - Author: ASF GitHub Bot Created on: 12/Apr/21 09:26 Start Date: 12/Apr/21 09:26 Worklog Time Spent: 10m Work Description: langlaile1221 opened a new pull request #2894: URL: https://github.com/apache/hadoop/pull/2894 …ant queue ## NOTICE Please create an issue in ASF JIRA before opening a pull request, and you need to set the title of the pull request which starts with the corresponding JIRA issue number. (e.g. HADOOP-X. Fix a typo in YYY.) For more details, please see https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580846) Remaining Estimate: 0h Time Spent: 10m > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhanghuazong updated HDFS-15966: Description: Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant queue. (was: Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant queue,) > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Priority: Minor > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue
[ https://issues.apache.org/jira/browse/HDFS-15966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhanghuazong updated HDFS-15966: Summary: Empty the statistical parameters when emptying the redundant queue (was: Empty the statistical parameters when emptying the redundant queue, ) > Empty the statistical parameters when emptying the redundant queue > -- > > Key: HDFS-15966 > URL: https://issues.apache.org/jira/browse/HDFS-15966 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: zhanghuazong >Priority: Minor > > Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and > highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant > queue, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15966) Empty the statistical parameters when emptying the redundant queue,
zhanghuazong created HDFS-15966: --- Summary: Empty the statistical parameters when emptying the redundant queue, Key: HDFS-15966 URL: https://issues.apache.org/jira/browse/HDFS-15966 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: zhanghuazong Clear the two indicators highestPriorityLowRedundancyReplicatedBlocks and highestPriorityLowRedundancyReplicatedBlocks when emptying the redundant queue, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168 ] Shashikant Banerjee edited comment on HDFS-15614 at 4/12/21, 8:46 AM: -- Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {code:java} 1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes will try this stuff in an attempt to become active and come out of safemode. Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, I can tell you one which I tried: Namespace Quotas, and yep the namenode crashed. can be bunch of such cases {code} If mkdir fails to create the Trash directory , inside the snapshot root, then strict ordering/processing of all entries during snapshot deletion can not be guaranteed, If this feature needs to be used, .Trash needs to be within the snapshottable directory which is similar to the case with encryption zones. {code:java} 2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin he didn't had any Trash directory in the snapshot dir, Suddenly a failover happened, he would get a trash directory in its snapshot directory, Which he never created.{code} If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created impliclitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {code:java} Third, The time cost, The namenode startup or the namenode failover or let it be coming out of safemode should be fast, They are actually contributing to cluster down time, and here we are doing like first getSnapshottableDirs which itself would be a heavy call if you have a lot of snapshots, then for each directory, one by one we are doing a getFileInfo and then a mkdir, seems like time-consuming. Not sure about the memory consumption at that point due to this though... {code} I don't think getSnapshottableDirs() is a very heavey call in typical setups. It has nothing to do with the no of snapshots that exist in the sytem. {code:java} Fourth, Why the namenode needs to do a client operation? It is the server. And that too while starting up, This mkdirs from namenode to self is itself suspicious, Bunch of namenode crashing coming up trying to become active, trying to push same edits, Hopefully you would have taken that into account and pretty sure such things won't occur, Namenodes won't collide even in the rarest cases. yep and all safe with the permissions.. {code} This is important for provisioning snapshot trash to use ordered snapshot deletion feature if the system already had pre existing snapshottable directories. was (Author: shashikant): Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {code:java} 1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes will try this stuff in an attempt to become active and come out of safemode. Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, I can tell you one which I tried: Namespace Quotas, and yep the namenode crashed. can be bunch of such cases {code} If mkdir fails to create the Trash directory , inside the snapshot root, then strict ordering/processing of all entries during snapshot deletion can not be guaranteed, If this feature needs to be used, .Trash needs to be within the snapshottable directory which is similar to the case with encryption zones. {code:java} 2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin he didn't had any Trash directory in the snapshot dir, Suddenly a failover happened, he would get a trash directory in its snapshot directory, Which he never created.{code} If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created impliclitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {code:java} Third, The time cost, The namenode startup or the namenode failover or let it be coming out of safemode should be fast, They are actually contributing to cluster down time, and here we are doing like first getSnapshottableDirs which itself would be a heavy call if you have a lot of snapshots, then for each directory, one by one we are doing a getFileInfo and then a mkdir, seems like time-consuming. Not sure about the memory consumption at that point due to this though... {code} I don't think getSnapshottableDirs() is a very heavey call in typical setups. It has nothing to do with the no of snapshots that exist in the sytem. {code:java} Fourth, Why the namenode needs to do a client operation? It is the server. And that too while starting up, This mkdirs from namenode to
[jira] [Commented] (HDFS-15614) Initialize snapshot trash root during NameNode startup if enabled
[ https://issues.apache.org/jira/browse/HDFS-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319168#comment-17319168 ] Shashikant Banerjee commented on HDFS-15614: Thanks [~ayushtkn]. The "getAllSnapshottableDirs()" in itslef is not a heavy call IMO. It does not depend on the no of snapshots present in the system. {code:java} 1. What if the mkdirs fail? the namenode will crash, ultimately all Namenodes will try this stuff in an attempt to become active and come out of safemode. Hence all the namenodes will crash. Why mkdirs can fail, could be many reasons, I can tell you one which I tried: Namespace Quotas, and yep the namenode crashed. can be bunch of such cases {code} If mkdir fails to create the Trash directory , inside the snapshot root, then strict ordering/processing of all entries during snapshot deletion can not be guaranteed, If this feature needs to be used, .Trash needs to be within the snapshottable directory which is similar to the case with encryption zones. {code:java} 2. Secondly, An ambiguity, A client did an allowSnapshot say not from HdfsAdmin he didn't had any Trash directory in the snapshot dir, Suddenly a failover happened, he would get a trash directory in its snapshot directory, Which he never created.{code} If a new directory is made snapshottable with feature flahg turned , .Trash directory gets created impliclitly as a part of allowSnapshot call. I don't think there is an ambiguity here. {code:java} Third, The time cost, The namenode startup or the namenode failover or let it be coming out of safemode should be fast, They are actually contributing to cluster down time, and here we are doing like first getSnapshottableDirs which itself would be a heavy call if you have a lot of snapshots, then for each directory, one by one we are doing a getFileInfo and then a mkdir, seems like time-consuming. Not sure about the memory consumption at that point due to this though... {code} I don't think getSnapshottableDirs() is a very heavey call in typical setups. It has nothing to do with the no of snapshots that exist in the sytem. {code:java} Fourth, Why the namenode needs to do a client operation? It is the server. And that too while starting up, This mkdirs from namenode to self is itself suspicious, Bunch of namenode crashing coming up trying to become active, trying to push same edits, Hopefully you would have taken that into account and pretty sure such things won't occur, Namenodes won't collide even in the rarest cases. yep and all safe with the permissions.. {code} This is important for provisioning snapshot trash to use ordered snapshot deletion feature if the system already had pre existing snapshottable directories. > Initialize snapshot trash root during NameNode startup if enabled > - > > Key: HDFS-15614 > URL: https://issues.apache.org/jira/browse/HDFS-15614 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Siyao Meng >Assignee: Siyao Meng >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This is a follow-up to HDFS-15607. > Goal: > Initialize (create) snapshot trash root for all existing snapshottable > directories if {{dfs.namenode.snapshot.trashroot.enabled}} is set to > {{true}}. So admins won't have to run {{dfsadmin -provisionTrash}} manually > on all those existing snapshottable directories. > The change is expected to land in {{FSNamesystem}}. > Discussion: > 1. Currently in HDFS-15607, the snapshot trash root creation logic is on the > client side. But in order for NN to create it at startup, the logic must > (also) be implemented on the server side as well. -- which is also a > requirement by WebHDFS (HDFS-15612). > 2. Alternatively, we can provide an extra parameter to the > {{-provisionTrash}} command like: {{dfsadmin -provisionTrash -all}} to > initialize/provision trash root on all existing snapshottable dirs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580826 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 08:16 Start Date: 12/Apr/21 08:16 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611418779 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java ## @@ -1805,4 +1806,38 @@ public void testNotifyNamenodeMissingOrNewBlock() throws Exception { cluster.shutdown(); } } + + @Test Review comment: Thanks, I'll add it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580826) Time Spent: 2h 10m (was: 2h) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580819 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 07:59 Start Date: 12/Apr/21 07:59 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611407579 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java ## @@ -562,4 +565,57 @@ void writeBlock(ExtendedBlock block, BlockConstructionStage stage, checksum, CachingStrategy.newDefaultStrategy(), false, false, null, null, new String[0]); } + + @Test + public void testReleaseVolumeRefIfExceptionThrown() throws IOException { +Path file = new Path("dataprotocol.dat"); +int numDataNodes = 1; + +Configuration conf = new HdfsConfiguration(); +conf.setInt(DFSConfigKeys.DFS_REPLICATION_KEY, numDataNodes); +MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).numDataNodes( +numDataNodes).build(); +try { + cluster.waitActive(); + datanode = cluster.getFileSystem().getDataNodeStats( + DatanodeReportType.LIVE)[0]; + dnAddr = NetUtils.createSocketAddr(datanode.getXferAddr()); + FileSystem fileSys = cluster.getFileSystem(); + + int fileLen = Math.min( + conf.getInt(DFSConfigKeys.DFS_BLOCK_SIZE_KEY, 4096), 4096); + + DFSTestUtil.createFile(fileSys, file, fileLen, fileLen, + fileSys.getDefaultBlockSize(file), + fileSys.getDefaultReplication(file), 0L); + + // get the first blockid for the file + final ExtendedBlock firstBlock = DFSTestUtil.getFirstBlock(fileSys, file); + + String bpid = cluster.getNamesystem().getBlockPoolId(); + ExtendedBlock blk = new ExtendedBlock(bpid, firstBlock.getLocalBlock()); + sendBuf.reset(); + recvBuf.reset(); + + // delete the meta file to create a exception in BlockSender constructor + DataNode dn = cluster.getDataNodes().get(0); + cluster.getMaterializedReplica(0, blk).deleteMeta(); + + FsVolumeImpl volume = (FsVolumeImpl) DataNodeTestUtils.getFSDataset( + dn).getVolume(blk); + int beforeCnt = volume.getReferenceCount(); + + sender.copyBlock(blk, BlockTokenSecretManager.DUMMY_TOKEN); + sendRecvData("Copy a block.", false); + Thread.sleep(1000); + + int afterCnt = volume.getReferenceCount(); + assertEquals(beforeCnt, afterCnt); Review comment: I confirmed that this case has been handled. The reference will be closed when we close the corresponding BlockSender. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580819) Time Spent: 2h (was: 1h 50m) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 2h > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580817 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 07:49 Start Date: 12/Apr/21 07:49 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611401338 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetAsyncDiskService.java ## @@ -167,18 +167,26 @@ synchronized long countPendingDeletions() { * Execute the task sometime in the future, using ThreadPools. */ synchronized void execute(FsVolumeImpl volume, Runnable task) { -if (executors == null) { - throw new RuntimeException("AsyncDiskService is already shutdown"); -} -if (volume == null) { - throw new RuntimeException("A null volume does not have a executor"); -} -ThreadPoolExecutor executor = executors.get(volume.getStorageID()); -if (executor == null) { - throw new RuntimeException("Cannot find volume " + volume - + " for execution of task " + task); -} else { - executor.execute(task); +try { Review comment: The clean up code is in the finally block, so it will be executed even if an exception occurs. Thanks for your suggestions, I will fix it to make the style consistent. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580817) Time Spent: 1h 50m (was: 1h 40m) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 1h 50m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580810=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580810 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 07:34 Start Date: 12/Apr/21 07:34 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611391944 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java ## @@ -432,6 +432,7 @@ ris = new ReplicaInputStreams( blockIn, checksumIn, volumeRef, fileIoProvider); } catch (IOException ioe) { + IOUtils.cleanupWithLogger(null, volumeRef); Review comment: If there are no exceptions, the reference will be closed when its BlockSender is closed. I checked that the code that constructs the BlockSender closed it after it was used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580810) Time Spent: 1h 40m (was: 1.5h) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580803 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 07:23 Start Date: 12/Apr/21 07:23 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611385242 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/RamDiskAsyncLazyPersistService.java ## @@ -153,16 +154,24 @@ synchronized boolean queryVolume(FsVolumeImpl volume) { * Execute the task sometime in the future, using ThreadPools. */ synchronized void execute(String storageId, Runnable task) { -if (executors == null) { - throw new RuntimeException( - "AsyncLazyPersistService is already shutdown"); -} -ThreadPoolExecutor executor = executors.get(storageId); -if (executor == null) { - throw new RuntimeException("Cannot find root storage volume with id " + - storageId + " for execution of task " + task); -} else { - executor.execute(task); +try { Review comment: Yes, it is. But only when the task is not executed, an RuntimeException will be caught here. Once the task is executes the try ... with block, all exceptions will be caught there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580803) Time Spent: 1.5h (was: 1h 20m) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?focusedWorklogId=580801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-580801 ] ASF GitHub Bot logged work on HDFS-15963: - Author: ASF GitHub Bot Created on: 12/Apr/21 07:15 Start Date: 12/Apr/21 07:15 Worklog Time Spent: 10m Work Description: zhangshuyan0 commented on a change in pull request #2889: URL: https://github.com/apache/hadoop/pull/2889#discussion_r611380793 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/TestFsDatasetImpl.java ## @@ -1805,4 +1806,38 @@ public void testNotifyNamenodeMissingOrNewBlock() throws Exception { cluster.shutdown(); } } + + @Test + public void testReleaseVolumeRefIfExceptionThrown() throws IOException { +MiniDFSCluster cluster = new MiniDFSCluster.Builder( +new HdfsConfiguration()).build(); +cluster.waitActive(); +FsVolumeImpl vol = (FsVolumeImpl) dataset.getFsVolumeReferences().get(0); +ExtendedBlock eb; +ReplicaInfo info; +int beforeCnt = 0; +try { + List blockList = new ArrayList(); + eb = new ExtendedBlock(BLOCKPOOL, 1, 1, 1001); + info = new FinalizedReplica( + eb.getLocalBlock(), vol, vol.getCurrentDir().getParentFile()); + dataset.volumeMap.add(BLOCKPOOL, info); + ((LocalReplica) info).getBlockFile().createNewFile(); + ((LocalReplica) info).getMetaFile().createNewFile(); + blockList.add(info); + + // Create a runtime exception + dataset.asyncDiskService.shutdown(); + + beforeCnt = vol.getReferenceCount(); + dataset.invalidate(BLOCKPOOL, blockList.toArray(new Block[0])); + +} catch (RuntimeException re) { + int afterCnt = vol.getReferenceCount(); + assertEquals(beforeCnt, afterCnt); + re.printStackTrace(); Review comment: Ok, I'll remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 580801) Time Spent: 1h 20m (was: 1h 10m) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org