[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800266#comment-17800266 ] ASF GitHub Bot commented on HDFS-17300: --- LiuGuH opened a new pull request, #6383: URL: https://github.com/apache/hadoop/pull/6383 …ateid is always delayed with Active Namenode for a period of time ### Description of PR Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time. > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800264#comment-17800264 ] ASF GitHub Bot commented on HDFS-17300: --- LiuGuH closed pull request #6382: HDFS-17300. [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time URL: https://github.com/apache/hadoop/pull/6382 > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800263#comment-17800263 ] ASF GitHub Bot commented on HDFS-17300: --- LiuGuH opened a new pull request, #6382: URL: https://github.com/apache/hadoop/pull/6382 ### Description of PR Now when Observer NN is used, if the stateid is delayed , the rpcServer will be requeued into callqueue. If EditLogTailer is broken or something else wrong , the call will be requeued again and again. So Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time. > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800262#comment-17800262 ] ASF GitHub Bot commented on HDFS-17302: --- KeeProMise commented on code in PR #6380: URL: https://github.com/apache/hadoop/pull/6380#discussion_r1435995958 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/fairness/ProportionRouterRpcFairnessPolicyController.java: ## @@ -0,0 +1,76 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs.server.federation.fairness; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hdfs.server.federation.router.FederationUtil; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.util.Set; + +import static org.apache.hadoop.hdfs.server.federation.fairness.RouterRpcFairnessConstants.CONCURRENT_NS; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_FAIR_HANDLER_PROPORTION_DEFAULT; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_FAIR_HANDLER_PROPORTION_KEY_PREFIX; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_HANDLER_COUNT_DEFAULT; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_HANDLER_COUNT_KEY; + +/** + * Proportion fairness policy extending {@link AbstractRouterRpcFairnessPolicyController} + * and fetching proportion of handlers from configuration for all available name services, + * based on the proportion and the total number of handlers, calculate the handlers of all ns. + * The handlers count will not change for this controller. + */ +public class ProportionRouterRpcFairnessPolicyController extends +AbstractRouterRpcFairnessPolicyController{ + + private static final Logger LOG = + LoggerFactory.getLogger(ProportionRouterRpcFairnessPolicyController.class); + + public ProportionRouterRpcFairnessPolicyController(Configuration conf){ +init(conf); + } + + @Override + public void init(Configuration conf) { +super.init(conf); +// Total handlers configured to process all incoming Rpc. +int handlerCount = conf.getInt(DFS_ROUTER_HANDLER_COUNT_KEY, DFS_ROUTER_HANDLER_COUNT_DEFAULT); + +LOG.info("Handlers available for fairness assignment {} ", handlerCount); + +// Get all name services configured +Set allConfiguredNS = FederationUtil.getAllConfiguredNS(conf); + +// Insert the concurrent nameservice into the set to process together +allConfiguredNS.add(CONCURRENT_NS); +for (String nsId : allConfiguredNS) { Review Comment: You can take a look at https://issues.apache.org/jira/browse/HDFS-17302 for a detailed description. > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. > --- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to
[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800261#comment-17800261 ] ASF GitHub Bot commented on HDFS-17300: --- LiuGuH closed pull request #6376: HDFS-17300. [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time URL: https://github.com/apache/hadoop/pull/6376 > [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time > > > Key: HDFS-17300 > URL: https://issues.apache.org/jira/browse/HDFS-17300 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > Now when Observer NN is used, if the stateid is delayed , the > rpcServer will be requeued into callqueue. If EditLogTailer is broken or > something else wrong , the call will be requeued again and again. > So Observer should throw ObserverRetryOnActiveException if stateid is > always delayed with Active Namenode for a configured time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800260#comment-17800260 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435991712 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: @ayushtkn, That won't change for now. Thanks your guidance! > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: h2. Current shortcomings [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. In addition, when I reconfigure the total number of handlers on the router, I have to re-allocate handlers to each ns, which undoubtedly increases the complexity of operation and maintenance. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. h2. New features Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1. *More user-friendly configuration* : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has. 2. *Sharing and isolation* : Sharing is as important as isolation. We support that the sum of handlers for all ns exceeds the total number of handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800259#comment-17800259 ] ASF GitHub Bot commented on HDFS-17056: --- ayushtkn commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435987418 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: we can change but it won't be consistent with others https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java#L115-L117 https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java#L115-L117 https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java#L115-L117 > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: h2. Current shortcomings [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. h2. New features Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1. *More user-friendly configuration* : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has. 2. *Sharing and isolation* : Sharing is as important as isolation. We support that the sum of handlers for all ns exceeds the total number of handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 (0.5) handlers.This feature is very important,.Consider the following scenarios: - Only one ns is busy during a period of time: Assume that ns1 has more requests from 0 to 8
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: h2. Current shortcomings [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. h2. New features Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1. *More user-friendly configuration* : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has. 2. *Sharing and isolation* : Sharing is as important as isolation. We support that the sum of handlers for all ns exceeds the total number of handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 (0.5) handlers.This feature is very important,.Consider the following scenarios: - Only one ns is busy during a period of time: Assume that ns1 has more requests from 0 to 8
[jira] [Commented] (HDFS-17300) [SBN READ] Observer should throw ObserverRetryOnActiveException if stateid is always delayed with Active Namenode for a configured time
[ https://issues.apache.org/jira/browse/HDFS-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800257#comment-17800257 ] ASF GitHub Bot commented on HDFS-17300: --- hadoop-yetus commented on PR #6376: URL: https://github.com/apache/hadoop/pull/6376#issuecomment-1868814964 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 21s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 51s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 19m 1s | | trunk passed | | +1 :green_heart: | compile | 8m 14s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 7m 26s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 2m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 41s | | trunk passed | | +1 :green_heart: | javadoc | 1m 21s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 21s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 7m 51s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 7m 51s | | the patch passed | | +1 :green_heart: | compile | 7m 22s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 7m 22s | | the patch passed | | +1 :green_heart: | blanks | 0m 1s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 55s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6376/2/artifact/out/results-checkstyle-root.txt) | root: The patch generated 3 new + 267 unchanged - 0 fixed = 270 total (was 267) | | +1 :green_heart: | mvnsite | 1m 42s | | the patch passed | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 39s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 45s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 47s | | hadoop-common in the patch passed. | | -1 :x: | unit | 200m 54s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6376/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 38s | | The patch does not generate ASF License warnings. | | | | 345m 51s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestErasureCodingPoliciesWithRandomECPolicy | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.TestDFSStripedOutputStream | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6376/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6376 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 59544ada24be 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool |
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: h2. Current shortcomings [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. h2. New features Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1. *More user-friendly configuration* : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has. 2. *Sharing and isolation* : Sharing is as important as isolation. We support that the sum of handlers for all ns exceeds the total number of handlers. For example, assuming we have 10 handlers and 3 ns, we can allocate 5 (0.5) handlers to ns1, 5 (0.5) handlers to ns2, and ns3 also allocates 5 (0.5) handlers.This feature is very important,.Consider the following scenarios: - Only one ns is busy during a period of time: Assume that ns1 has more requests from 0 to 8
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Summary: RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. (was: RBF: ProportionRouterRpcFairnessPolicyController-Sharing and Isolating.) > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. > --- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to each ns so that the sum of handlers for all ns will not exceed > the total handlers of the router, and I also need to consider how many > handlers to allocate to each ns to achieve better performance. Therefore, I > need to be very careful when configuring. Even if I configure only one more > handler for a certain ns, the total number is more than the number of > handlers owned by the router, which will also cause the router to fail to > start. At this time, I had to investigate the reason why the router failed to > start. After finding the reason, I had to reconsider the number of handlers > for each ns. > 2. *Extension ns is not supported*: During the running of the router, if a > new ns is added to the cluster and a mount is added for the ns, but because > no handler is allocated for the ns, the ns cannot be accessed through the > router. We must reconfigure the number of handlers and then refresh the > configuration. At this time, the router can access the ns normally. When we > reconfigure the number of handlers, we have to face disadvantage 1: > Configuration is inconvenient and error-prone. > 3. *Waste handlers*: The main purpose of proposing > RouterRpcFairnessPolicyController is to enable the router to access ns with > normal load and not be affected by ns with higher load. First of all, not all > ns have high loads; secondly, ns with high loads do not have high loads 24 > hours a day. It may be that only certain time periods, such as 0 to 8 > o'clock, have high loads, and other time periods have normal loads. Assume > there are 2 ns, and each ns is allocated half of the number of handlers. > Assume that ns1 has many requests from 0 to 14 o'clock, and almost no > requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, > and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and > 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more > requests and the other ns has almost no requests, so we have wasted half of > the number of handlers. > 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController > does not support sharing, only isolation. I think isolation is just a means > to improve the performance of router access to normal ns, not the purpose. It > is impossible for all ns in the cluster to have high loads. On the contrary, > in most scenarios, only a few ns in the cluster have high loads, and the > loads of most other ns are normal. For ns with higher load and ns with normal > load, we need to isolate their handlers so that the ns with higher load will > not affect the performance of ns with lower load. However, for nameservices > that are also under normal load, or are under higher load, we do not need to > isolate them, these ns of the same nature can share the handlers of the > router; The performance is better than assigning a fixed number of handlers > to each ns, because each ns can use all the handlers of the router. > h2. New features > Based on the above staticRouterRpcFairnessPolicyController, there are > deficiencies in usage and performance. I provide a new > RouterRpcFairnessPolicyController: > ProportionRouterRpcFairnessPolicyController (maybe with a better name)
[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800255#comment-17800255 ] ASF GitHub Bot commented on HDFS-17302: --- huangzhaobo99 commented on code in PR #6380: URL: https://github.com/apache/hadoop/pull/6380#discussion_r1435977171 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/fairness/ProportionRouterRpcFairnessPolicyController.java: ## @@ -0,0 +1,76 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs.server.federation.fairness; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.hdfs.server.federation.router.FederationUtil; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; +import java.util.Set; + +import static org.apache.hadoop.hdfs.server.federation.fairness.RouterRpcFairnessConstants.CONCURRENT_NS; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_FAIR_HANDLER_PROPORTION_DEFAULT; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_FAIR_HANDLER_PROPORTION_KEY_PREFIX; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_HANDLER_COUNT_DEFAULT; +import static org.apache.hadoop.hdfs.server.federation.router.RBFConfigKeys.DFS_ROUTER_HANDLER_COUNT_KEY; + +/** + * Proportion fairness policy extending {@link AbstractRouterRpcFairnessPolicyController} + * and fetching proportion of handlers from configuration for all available name services, + * based on the proportion and the total number of handlers, calculate the handlers of all ns. + * The handlers count will not change for this controller. + */ +public class ProportionRouterRpcFairnessPolicyController extends +AbstractRouterRpcFairnessPolicyController{ + + private static final Logger LOG = + LoggerFactory.getLogger(ProportionRouterRpcFairnessPolicyController.class); + + public ProportionRouterRpcFairnessPolicyController(Configuration conf){ +init(conf); + } + + @Override + public void init(Configuration conf) { +super.init(conf); +// Total handlers configured to process all incoming Rpc. +int handlerCount = conf.getInt(DFS_ROUTER_HANDLER_COUNT_KEY, DFS_ROUTER_HANDLER_COUNT_DEFAULT); + +LOG.info("Handlers available for fairness assignment {} ", handlerCount); + +// Get all name services configured +Set allConfiguredNS = FederationUtil.getAllConfiguredNS(conf); + +// Insert the concurrent nameservice into the set to process together +allConfiguredNS.add(CONCURRENT_NS); +for (String nsId : allConfiguredNS) { Review Comment: From the allocation perspective, Basically consistent with the StaticRouterPCFairnessPolicyController policy. Can you share on your ideas? Thx. > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. > --- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and Isolating.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Summary: RBF: ProportionRouterRpcFairnessPolicyController-Sharing and Isolating. (was: RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores) > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and Isolating. > --- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to each ns so that the sum of handlers for all ns will not exceed > the total handlers of the router, and I also need to consider how many > handlers to allocate to each ns to achieve better performance. Therefore, I > need to be very careful when configuring. Even if I configure only one more > handler for a certain ns, the total number is more than the number of > handlers owned by the router, which will also cause the router to fail to > start. At this time, I had to investigate the reason why the router failed to > start. After finding the reason, I had to reconsider the number of handlers > for each ns. > 2. *Extension ns is not supported*: During the running of the router, if a > new ns is added to the cluster and a mount is added for the ns, but because > no handler is allocated for the ns, the ns cannot be accessed through the > router. We must reconfigure the number of handlers and then refresh the > configuration. At this time, the router can access the ns normally. When we > reconfigure the number of handlers, we have to face disadvantage 1: > Configuration is inconvenient and error-prone. > 3. *Waste handlers*: The main purpose of proposing > RouterRpcFairnessPolicyController is to enable the router to access ns with > normal load and not be affected by ns with higher load. First of all, not all > ns have high loads; secondly, ns with high loads do not have high loads 24 > hours a day. It may be that only certain time periods, such as 0 to 8 > o'clock, have high loads, and other time periods have normal loads. Assume > there are 2 ns, and each ns is allocated half of the number of handlers. > Assume that ns1 has many requests from 0 to 14 o'clock, and almost no > requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, > and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and > 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more > requests and the other ns has almost no requests, so we have wasted half of > the number of handlers. > 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController > does not support sharing, only isolation. I think isolation is just a means > to improve the performance of router access to normal ns, not the purpose. It > is impossible for all ns in the cluster to have high loads. On the contrary, > in most scenarios, only a few ns in the cluster have high loads, and the > loads of most other ns are normal. For ns with higher load and ns with normal > load, we need to isolate their handlers so that the ns with higher load will > not affect the performance of ns with lower load. However, for nameservices > that are also under normal load, or are under higher load, we do not need to > isolate them, these ns of the same nature can share the handlers of the > router; The performance is better than assigning a fixed number of handlers > to each ns, because each ns can use all the handlers of the router. > h2. New features > Based on the above staticRouterRpcFairnessPolicyController, there are > deficiencies in usage and performance. I provide a new > RouterRpcFairnessPolicyController: > ProportionRouterRpcFairnessPolicyController
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: h2. Current shortcomings [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. h2. New features Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1. *More user-friendly configuration* : Supports allocating handlers proportionally to each ns. For example, we can give ns1 a handler ratio of 0.2, then ns1 will use 0.2 of the total number of handlers on the router. Using this method, we do not need to confirm in advance how many handlers the router has. 2. *Sharing* : was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways,
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. 1.* More user-friendly configuration*: Supports allocating handlers proportionally to each ns. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. I provide a new RouterRpcFairnessPolicyController: ProportionRouterRpcFairnessPolicyController (maybe with a better name) to solve the above major shortcomings. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. Based on the above staticRouterRpcFairnessPolicyController, there are deficiencies in usage and performance. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800252#comment-17800252 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435970897 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: Thx, @haiyang1987, No problem! Changing the error message: ```java System.err.println(getName() + ": Input invalid arguments.\nUsage: " + getLongUsage()); ``` How about it? cc @ayushtkn > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 to 14 o'clock, and almost no requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. 4.*Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController does not support sharing, only isolation. I think isolation is just a means to improve the performance of router access to normal ns, not the purpose. It is impossible for all ns in the cluster to have high loads. On the contrary, in most scenarios, only a few ns in the cluster have high loads, and the loads of most other ns are normal. For ns with higher load and ns with normal load, we need to isolate their handlers so that the ns with higher load will not affect the performance of ns with lower load. However, for nameservices that are also under normal load, or are under higher load, we do not need to isolate them, these ns of the same nature can share the handlers of the router; The performance is better than assigning a fixed number of handlers to each ns, because each ns can use all the handlers of the router. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is
[jira] [Updated] (HDFS-17298) Fix NPE in DataNode.handleBadBlock and BlockSender
[ https://issues.apache.org/jira/browse/HDFS-17298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17298: -- Component/s: datanode > Fix NPE in DataNode.handleBadBlock and BlockSender > -- > > Key: HDFS-17298 > URL: https://issues.apache.org/jira/browse/HDFS-17298 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > There are some NPE issues on the DataNode side of our online environment. > The detailed exception information is > {code:java} > 2023-12-20 13:58:25,449 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client DFSClient_NONMAPREDUCE_xxx at /xxx:41452 [Sending > block BP-xxx:blk_xxx]] - xxx:50010:DataXceiver error processing READ_BLOCK > operation src: /xxx:41452 dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.(BlockSender.java:301) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:607) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > if (!fromScanner && blockScanner.isEnabled()) { > // data.getVolume(block) is null > blockScanner.markSuspectBlock(data.getVolume(block).getStorageID(), > block); > } > {code} > {code:java} > 2023-12-20 13:52:18,844 ERROR datanode.DataNode (DataXceiver.java:run(330)) > [DataXceiver for client /xxx:61052 [Copying block BP-xxx:blk_xxx]] - > xxx:50010:DataXceiver error processing COPY_BLOCK operation src: /xxx:61052 > dst: /xxx:50010 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.datanode.DataNode.handleBadBlock(DataNode.java:4045) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1163) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:298) > at java.lang.Thread.run(Thread.java:748) > {code} > NPE Code logic: > {code:java} > // Obtain a reference before reading data > volumeRef = datanode.data.getVolume(block).obtainReference(); > //datanode.data.getVolume(block) is null > {code} > We need to fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17297) The NameNode should remove block from the BlocksMap if the block is marked as deleted.
[ https://issues.apache.org/jira/browse/HDFS-17297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-17297: -- Component/s: namanode > The NameNode should remove block from the BlocksMap if the block is marked as > deleted. > -- > > Key: HDFS-17297 > URL: https://issues.apache.org/jira/browse/HDFS-17297 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > When call internalReleaseLease method: > {code:java} > boolean internalReleaseLease( > ... > int minLocationsNum = 1; > if (lastBlock.isStriped()) { > minLocationsNum = ((BlockInfoStriped) lastBlock).getRealDataBlockNum(); > } > if (uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0) { > // There is no datanode reported to this block. > // may be client have crashed before writing data to pipeline. > // This blocks doesn't need any recovery. > // We can remove this block and close the file. > pendingFile.removeLastBlock(lastBlock); > finalizeINodeFileUnderConstruction(src, pendingFile, > iip.getLatestSnapshotId(), false); > ... > } > {code} > if the condition `uc.getNumExpectedLocations() < minLocationsNum && > lastBlock.getNumBytes() == 0` is met during the execution of UNDER_RECOVERY > logic, the block is removed from the block list in the inode file and marked > as deleted. > However it is not removed from the BlocksMap, it may cause memory leak. > Therefore it is necessary to remove the block from the BlocksMap at this > point as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800250#comment-17800250 ] ASF GitHub Bot commented on HDFS-17302: --- hadoop-yetus commented on PR #6380: URL: https://github.com/apache/hadoop/pull/6380#issuecomment-1868772311 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 17m 10s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 41s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 31s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 38m 2s | | branch has no errors when building and testing our client artifacts. | | -0 :warning: | patch | 38m 23s | | Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6380/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | mvnsite | 0m 32s | | the patch passed | | +1 :green_heart: | javadoc | 0m 28s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 23s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 37m 43s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 23m 6s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 176m 57s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6380/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6380 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 9178f4c3fcd6 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 10a9459725fc05183b95fb4917f679a45cbe1bc7 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6380/1/testReport/ | | Max. process+thread count | 2399 (vs. ulimit of
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Waste handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 o'clock to 14 o'clock, and almost no requests from 14 o'clock to 24 o'clock. ns2 has many requests from 12 o'clock to 24 o'clock, and 0 o'clock - There are almost no requests at 12 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Cannot share handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of handlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Cannot share handlers*: The main purpose of proposing RouterRpcFairnessPolicyController is to enable the router to access ns with normal load and not be affected by ns with higher load. First of all, not all ns have high loads; secondly, ns with high loads do not have high loads 24 hours a day. It may be that only certain time periods, such as 0 to 8 o'clock, have high loads, and other time periods have normal loads. Assume there are 2 ns, and each ns is allocated half of the number of handlers. Assume that ns1 has many requests from 0 o'clock to 14 o'clock, and almost no requests from 14 o'clock to 24 o'clock. ns2 has many requests from 12 o'clock to 24 o'clock, and 0 o'clock - There are almost no requests at 12 o'clock; when it is between 0 o'clock and 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more requests and the other ns has almost no requests, so we have wasted half of the number of handlers. was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Cannot share handlers*: > RBF: ProportionRouterRpcFairnessPolicyController-support proportional >
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800248#comment-17800248 ] Ayush Saxena commented on HDFS-17299: - Yeps, Excluding a rack in the streamer is quite tricky, we don't know the BPP neither the Cluster Rack configuration during the {{DataStreamer}} setup. Maybe we should consider dropping the datanode from the pipeline, If possible, if we can't replace & reattempt with the remaining datanodes. Similarly as {{bestEffort}} in normal {{DatanodeReplacement}} case post the stream has been created. [https://github.com/apache/hadoop/blob/rel/release-2.10.2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/ReplaceDatanodeOnFailure.java#L114-L125] Namenode... I don't think we have anything better than Stale node, which just brings the time duration down, rather than fixing. rest I am also not very sure if there is any other clean way to handle this > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on
[jira] [Commented] (HDFS-17254) DataNode httpServer has too many worker threads
[ https://issues.apache.org/jira/browse/HDFS-17254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800247#comment-17800247 ] ASF GitHub Bot commented on HDFS-17254: --- xinglin commented on code in PR #6307: URL: https://github.com/apache/hadoop/pull/6307#discussion_r1435954635 ## hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml: ## @@ -154,6 +154,14 @@ + + dfs.datanode.netty.worker.threads + 10 Review Comment: I don't have a strong opinion here. Either way works for me. > DataNode httpServer has too many worker threads > --- > > Key: HDFS-17254 > URL: https://issues.apache.org/jira/browse/HDFS-17254 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Liangjun He >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > > When optimizing the thread number of high-density storage DN, we found the > number of worker threads for the DataNode httpServer is twice the number of > available cores on node , resulting in too many threads. We can change this > to be configurable. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800243#comment-17800243 ] Xiaoqiao He commented on HDFS-17299: Connection meet some issues? Seems we both have this same opinion. But I don't have idea to fix it smooth. Because at NameNode side it doesn't recognise the dead node/racks in time, at Client side it doesn't know how many racks in the cluster. Any ideas? > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at >
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800241#comment-17800241 ] Xiaoqiao He commented on HDFS-17299: [~shahrs87] Please reference here: https://github.com/apache/hadoop/blob/415e9bdfbdeebded520e0233bcb91a487411a94b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1641 IMO, if there are only two racks in cluster where one of them is out of service, the writer will be failure always by default configuration. I think we should fix this corner case issue. Let's wait if [~ayushtkn] could give any other suggestions. > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 >
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800240#comment-17800240 ] Ayush Saxena commented on HDFS-17299: - [~shahrs87] that config kicks in for post pipeline setup, not while creating one. So, I think your failure is during create itself. [https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1571-L1573] It won't reach here in your case since the pipeline wasn't setup, so nodes will be null here. [https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1455] Which I feel is a bug or atleast warrants for some improvements. :( The end solution is like go ahead with 2 nodes in pipeline, how to reach there we can figure out, mostly it should be via the ReplaceDatanodeOnFailure, but we can figure out. [~hexiaoqiao] The case is like for Default BPP, it would be like 2 racks & one rack down, but the Namenode didn't recognise the rack as down period But here the mentioned case if for rack fault tolerant BPP, 3 racks, replication factor 3 & 1 rack down, but the NN doesn't recognise that as dead, so it always tries to allocate node from all 3 racks, though 1 rack is dead & the create never succeeds, I have added a patch with a repro test, can give a check (a very quick patch, maybe wrong). interesting problem :) > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) >
[jira] [Updated] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HDFS-17299: Attachment: repro.patch > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,712 WARN [Thread-39087] hdfs.DataStreamer -
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800238#comment-17800238 ] ASF GitHub Bot commented on HDFS-17056: --- haiyang1987 commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435938037 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: Agree with @ayushtkn , the current works as designed, if don't specify any policy the result is if validate all the enabled policies. maybe the current ticket only fix verifyClusterSetup an invalid param output ,thanks~ > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. *Configuration is inconvenient and error-prone*: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. *Extension ns is not supported*: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. *Cannot share handlers*: was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. Configuration is inconvenient and error-prone: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. Extension ns is not supported: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. Cannot share handlers: > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the >
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. Configuration is inconvenient and error-prone: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. Extension ns is not supported: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. Cannot share handlers: was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. Configuration is inconvenient and error-prone: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. Extension ns is not supported: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. Cannot share handlers: > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the >
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. But the StaticRouterRpcFairnessPolicyController still falls short in many ways, such as: 1. Configuration is inconvenient and error-prone: When I use StaticRouterRpcFairnessPolicyController, I first need to know how many handlers the router has in total, then I have to know how many nameservices the router currently has, and then carefully calculate how many handlers to allocate to each ns so that the sum of handlers for all ns will not exceed the total handlers of the router, and I also need to consider how many handlers to allocate to each ns to achieve better performance. Therefore, I need to be very careful when configuring. Even if I configure only one more handler for a certain ns, the total number is more than the number of hadnlers owned by the router, which will also cause the router to fail to start. At this time, I had to investigate the reason why the router failed to start. After finding the reason, I had to reconsider the number of handlers for each ns. 2. Extension ns is not supported: During the running of the router, if a new ns is added to the cluster and a mount is added for the ns, but because no handler is allocated for the ns, the ns cannot be accessed through the router. We must reconfigure the number of handlers and then refresh the configuration. At this time, the router can access the ns normally. When we reconfigure the number of handlers, we have to face disadvantage 1: Configuration is inconvenient and error-prone. 3. Cannot share handlers: was:[HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] which provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. Configuration is inconvenient and error-prone: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to each ns so that the sum of handlers for all ns will not exceed > the total handlers of the router, and I also need to consider how many > handlers to allocate to each ns to achieve better performance. Therefore, I > need to be very careful when configuring. Even if I configure only one more > handler for a certain ns, the total number is more than the number of > hadnlers owned by the router, which will also cause the router to fail to > start. At this time, I had to investigate the reason why the router failed to > start. After finding the reason, I had to reconsider the number of handlers > for each ns. > 2. Extension ns is not supported: During the running of the router, if a new > ns is added to the cluster and a mount is added for the ns, but because no > handler is allocated for the ns, the ns cannot be accessed through the > router. We must reconfigure the number of handlers and then refresh the > configuration. At this time, the router can access the ns normally. When we > reconfigure the number of handlers, we have to face disadvantage 1: > Configuration is inconvenient and error-prone. >
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800236#comment-17800236 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435934977 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: Thx, I understand. It is necessary to modify the protocol in order to support policy level validation. > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800235#comment-17800235 ] ASF GitHub Bot commented on HDFS-17056: --- ayushtkn commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435934137 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: If you pass multiple policies, that means you want a combined result, like if all these are supported or not in the "cluster", if you want one policy, pass one policy only. The whole design is to verify things at cluster level, not at policy level. To highlight cluster level setup issues, like all the enabled policies aren't supported & things like that, it was created for cluster admin level usage. you can add an additional option which tells the result per policy if the additional option is provided, in that case post getting the result, it can loop over the policies and get individual result, changing proto & all is like last things to do, but I don't think it is required as of now. > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] : Improved isolation for downstream name nodes, which provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] : Improved > isolation for downstream name nodes, which provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Description: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] which provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load. (was: [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] : Improved isolation for downstream name nodes, which provides a StaticRouterRpcFairnessPolicyController to support configuring different handlers for different ns. Using the StaticRouterRpcFairnessPolicyController allows the router to isolate different ns, and the ns with a higher load will not affect the router's access to the ns with a normal load.) > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] which provides > a StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Attachment: HDFS-17302.002.patch > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian Zhang updated HDFS-17302: -- Attachment: HDFS-17302.001.patch Status: Patch Available (was: Open) > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800231#comment-17800231 ] ASF GitHub Bot commented on HDFS-17302: --- KeeProMise opened a new pull request, #6380: URL: https://github.com/apache/hadoop/pull/6380 ### Description of PR ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17302: -- Labels: pull-request-available (was: ) > RBF: ProportionRouterRpcFairnessPolicyController-support proportional > allocation of semaphores > -- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores
Jian Zhang created HDFS-17302: - Summary: RBF: ProportionRouterRpcFairnessPolicyController-support proportional allocation of semaphores Key: HDFS-17302 URL: https://issues.apache.org/jira/browse/HDFS-17302 Project: Hadoop HDFS Issue Type: New Feature Components: rbf Reporter: Jian Zhang Assignee: Jian Zhang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800223#comment-17800223 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435902418 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: @ayushtkn Thx for your Q, this is the design of the API itself. However, the combined message cannot clearly explain the reasons why per EC policy is not supported. > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800217#comment-17800217 ] Rushabh Shah commented on HDFS-17299: - > Maybe if you would have put > dfs.client.block.write.replace-datanode-on-failure.enable as false, it > wouldn't have tried to replace the DN itself & went ahead with 2 DN from > other AZ? It is entirely possible that I am not reading the code right. I am little bit out of sync with the DataStreamer codebase. But I don't see this config property dfs.client.block.write.replace-datanode-on-failure.enable being used anywhere in the PIPELINE_SETUP_CREATE phase. I am looking at the branch-2.10 branch. This is the code flow. [DataStreamer#run()|https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L708-L711] --> [nextBlockOutputStream()|https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1655] --> [createBlockOutputStream()|https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1751] There is a retry within nextBlockOutputStream via dfs.client.block.write.retries but it doesn't take dfs.client.block.write.replace-datanode-on-failure.enable in consideration. Cc [~ayushtkn] [~hexiaoqiao] > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at >
[jira] [Commented] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.
[ https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800201#comment-17800201 ] ASF GitHub Bot commented on HDFS-17301: --- hadoop-yetus commented on PR #6377: URL: https://github.com/apache/hadoop/pull/6377#issuecomment-1868572225 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 1s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 13m 47s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 35m 29s | | trunk passed | | +1 :green_heart: | compile | 18m 9s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 28s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 12s | | trunk passed | | +1 :green_heart: | javadoc | 2m 28s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 40s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 1s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 34s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 6s | | the patch passed | | +1 :green_heart: | compile | 17m 34s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 34s | | the patch passed | | +1 :green_heart: | compile | 16m 26s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 26s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 4m 34s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 8s | | the patch passed | | +1 :green_heart: | javadoc | 2m 23s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 32s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 20s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 30s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 11s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 251m 1s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 5s | | The patch does not generate ASF License warnings. | | | | 513m 4s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6377/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6377 | | Optional Tests | dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle | | uname | Linux d3840b787b54 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ecb37461d2ce1ebe233aa07f552d40c1cf9a1e42 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6377/4/testReport/ | | Max. process+thread count | 2804 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . | | Console output |
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800200#comment-17800200 ] ASF GitHub Bot commented on HDFS-17056: --- ayushtkn commented on code in PR #6379: URL: https://github.com/apache/hadoop/pull/6379#discussion_r1435864191 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/ECAdmin.java: ## @@ -642,6 +642,10 @@ public int run(Configuration conf, List args) throws IOException { throw e; } } else { +if (args.size() > 0) { + System.err.println(getName() + ": Too many arguments"); Review Comment: It works as designed, if you don't specify any policy the result is if all the policies enabled can work or not & the minimum number of datanodes required for each of them. It is validating cluster setup for the enabled policies, not per policy setup. if you specify multiple policies, it provides the aggregate result, like if all those policies can work on the cluster or not & if not a combined message like how man DN are required. like ``` 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, XOR-2-1-1024k. The number of DataNodes is only 3. ``` Like here, 9 is required number because of 6-3. If we intend to improve the behaviour we can think if there is a need > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17293) First packet data + checksum size will be set to 516 bytes when writing to a new block.
[ https://issues.apache.org/jira/browse/HDFS-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800196#comment-17800196 ] ASF GitHub Bot commented on HDFS-17293: --- hfutatzhanghb commented on code in PR #6368: URL: https://github.com/apache/hadoop/pull/6368#discussion_r1435849521 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java: ## @@ -536,8 +536,13 @@ protected void adjustChunkBoundary() { } if (!getStreamer().getAppendChunk()) { - final int psize = (int) Math - .min(blockSize - getStreamer().getBytesCurBlock(), writePacketSize); + int psize = 0; + if (blockSize == getStreamer().getBytesCurBlock()) { Review Comment: @Hexiaoqiao Sir, thanks for your replying. I will add unit tests soonly when i am avaliable. > First packet data + checksum size will be set to 516 bytes when writing to a > new block. > --- > > Key: HDFS-17293 > URL: https://issues.apache.org/jira/browse/HDFS-17293 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > > First packet size will be set to 516 bytes when writing to a new block. > In method computePacketChunkSize, the parameters psize and csize would be > (0, 512) > when writting to a new block. It should better use writePacketSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17273) Improve local variables duration of DataStreamer for better debugging.
[ https://issues.apache.org/jira/browse/HDFS-17273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800182#comment-17800182 ] ASF GitHub Bot commented on HDFS-17273: --- hfutatzhanghb commented on PR #6321: URL: https://github.com/apache/hadoop/pull/6321#issuecomment-1868518216 > Committed to trunk. Thanks @hfutatzhanghb , @haiyang1987 and @tomscut . @Hexiaoqiao Sir, thanks a lot for your merging and reviewing. Thanks @haiyang1987 @tomscut for reviewing carefully. > Improve local variables duration of DataStreamer for better debugging. > --- > > Key: HDFS-17273 > URL: https://issues.apache.org/jira/browse/HDFS-17273 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: farmmamba >Assignee: farmmamba >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800168#comment-17800168 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on PR #6379: URL: https://github.com/apache/hadoop/pull/6379#issuecomment-1868495316 > May be necessary to modify the erasurecoding.proto file, `required` to `repeated`, similar to `DistributedFileSystem#addErasureCodingPolicies`. > > This modification will result in a many to many relationship. > > ```proto > // now > message GetECTopologyResultForPoliciesResponseProto { > required ECTopologyVerifierResultProto response = 1; > } > > // expect > message GetECTopologyResultForPoliciesResponseProto { > repeated ECTopologyVerifierResultProto response = 1; > } > ``` Hi @ayushtkn @haiyang1987, If you have time, let's discuss it together, Thanks. > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.
[ https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800165#comment-17800165 ] ASF GitHub Bot commented on HDFS-17056: --- huangzhaobo99 commented on PR #6379: URL: https://github.com/apache/hadoop/pull/6379#issuecomment-1868494355 May be necessary to modify the erasurecoding.proto file, `required` to `repeated`, similar to `DistributedFileSystem#addErasureCodingPolicies`. This modification will result in a many to many relationship. ```proto // now message GetECTopologyResultForPoliciesResponseProto { required ECTopologyVerifierResultProto response = 1; } // expect message GetECTopologyResultForPoliciesResponseProto { repeated ECTopologyVerifierResultProto response = 1; } ``` > EC: Fix verifyClusterSetup output in case of an invalid param. > -- > > Key: HDFS-17056 > URL: https://issues.apache.org/jira/browse/HDFS-17056 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec >Reporter: Ayush Saxena >Assignee: huangzhaobo99 >Priority: Major > Labels: newbie, pull-request-available > > {code:java} > bin/hdfs ec -verifyClusterSetup XOR-2-1-1024k > 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, > XOR-2-1-1024k. The number of DataNodes is only 3. {code} > verifyClusterSetup requires -policy then the name of policies, else it > defaults to all enabled policies. > In case there are additional invalid options it silently ignores them, unlike > other EC commands which throws out Too Many Argument exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17301) Add read and write dataXceiver threads count metrics to datanode.
[ https://issues.apache.org/jira/browse/HDFS-17301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800154#comment-17800154 ] ASF GitHub Bot commented on HDFS-17301: --- hadoop-yetus commented on PR #6377: URL: https://github.com/apache/hadoop/pull/6377#issuecomment-1868473774 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 11m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 31m 57s | | trunk passed | | +1 :green_heart: | compile | 17m 10s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 15m 58s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 5m 0s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 19s | | trunk passed | | +1 :green_heart: | javadoc | 2m 26s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 30s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 28s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 28s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 12s | | the patch passed | | +1 :green_heart: | compile | 17m 40s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 40s | | the patch passed | | +1 :green_heart: | compile | 16m 51s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 51s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 4m 39s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 10s | | the patch passed | | +1 :green_heart: | javadoc | 2m 16s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 34s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 47s | | the patch passed | | +1 :green_heart: | shadedclient | 40m 26s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 44s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 216m 36s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 6s | | The patch does not generate ASF License warnings. | | | | 485m 45s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6377/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6377 | | Optional Tests | dupname asflicense mvnsite codespell detsecrets markdownlint compile javac javadoc mvninstall unit shadedclient spotbugs checkstyle | | uname | Linux e59912d06f26 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e65f7a52ff6f6013bfaebe0f6dcb04bd13c78d30 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6377/3/testReport/ | | Max. process+thread count | 3699 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs U: . | | Console output |
[jira] [Commented] (HDFS-17293) First packet data + checksum size will be set to 516 bytes when writing to a new block.
[ https://issues.apache.org/jira/browse/HDFS-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800151#comment-17800151 ] ASF GitHub Bot commented on HDFS-17293: --- Hexiaoqiao commented on code in PR #6368: URL: https://github.com/apache/hadoop/pull/6368#discussion_r1435791414 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java: ## @@ -536,8 +536,13 @@ protected void adjustChunkBoundary() { } if (!getStreamer().getAppendChunk()) { - final int psize = (int) Math - .min(blockSize - getStreamer().getBytesCurBlock(), writePacketSize); + int psize = 0; + if (blockSize == getStreamer().getBytesCurBlock()) { Review Comment: Not think carefully, but it seems good improvement for my first feeling. cc @zhangshuyan0 Would you mind to take another review? @hfutatzhanghb it is better to add unit tests to cover this case. Thanks. > First packet data + checksum size will be set to 516 bytes when writing to a > new block. > --- > > Key: HDFS-17293 > URL: https://issues.apache.org/jira/browse/HDFS-17293 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > > First packet size will be set to 516 bytes when writing to a new block. > In method computePacketChunkSize, the parameters psize and csize would be > (0, 512) > when writting to a new block. It should better use writePacketSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17280) Pipeline recovery should better end block in advance when bytes acked greater than half of blocksize.
[ https://issues.apache.org/jira/browse/HDFS-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800150#comment-17800150 ] ASF GitHub Bot commented on HDFS-17280: --- Hexiaoqiao commented on PR #6336: URL: https://github.com/apache/hadoop/pull/6336#issuecomment-1868468065 @hfutatzhanghb Thanks for your contribution! Sorry I didn't get this proposal clearly. Would you mind to offer some more information about what issue do you meet, and what this PR could do? Thanks again. > Pipeline recovery should better end block in advance when bytes acked greater > than half of blocksize. > - > > Key: HDFS-17280 > URL: https://issues.apache.org/jira/browse/HDFS-17280 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs-client >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org