[jira] [Commented] (HDFS-17314) Add a metrics to record congestion backoff counts.
[ https://issues.apache.org/jira/browse/HDFS-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802039#comment-17802039 ] ASF GitHub Bot commented on HDFS-17314: --- hfutatzhanghb commented on PR #6398: URL: https://github.com/apache/hadoop/pull/6398#issuecomment-1874945130 Updated with unit test. > Add a metrics to record congestion backoff counts. > -- > > Key: HDFS-17314 > URL: https://issues.apache.org/jira/browse/HDFS-17314 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Affects Versions: 3.3.6 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > Labels: pull-request-available > > When we enable congestion backoff, we should better know how many times > datanodes have told client backoff. This metrics can help us know better > about congestion function. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802034#comment-17802034 ] ASF GitHub Bot commented on HDFS-17317: --- xuzifu666 commented on PR #6402: URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874933861 > I have triggered a new build: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6402/2/ > > we need to get a green build before we can merge OK,Thanks > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > Labels: pull-request-available > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802033#comment-17802033 ] ASF GitHub Bot commented on HDFS-17317: --- ayushtkn commented on PR #6402: URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874932858 I have triggered a new build: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6402/2/ we need to get a green build before we can merge > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > Labels: pull-request-available > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17305) Add avoid datanode reason count related metrics to namenode.
[ https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802030#comment-17802030 ] ASF GitHub Bot commented on HDFS-17305: --- huangzhaobo99 commented on PR #6393: URL: https://github.com/apache/hadoop/pull/6393#issuecomment-1874927909 Hi @tasanuma @ayushtkn, Please kindly review this PR as well if you have bandwidth. thanks. > Add avoid datanode reason count related metrics to namenode. > > > Key: HDFS-17305 > URL: https://issues.apache.org/jira/browse/HDFS-17305 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Minor > Labels: pull-request-available > > Now, there are slownode and load avoidance functions, mainly implemented in > the BlockPlacementPolicyDefault class. > 1. After triggering the exclusion condition, some logs will be printed on nn, > which can be used to troubleshoot anomalies in nn by checking the logs, the > code is as follows: > {code:java} > ... > if (!node.isInService()) { > logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE); > return false; > } > if (avoidStaleNodes) { > if (node.isStale(this.staleInterval)) { > logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE); > return false; > } > } > ...{code} > 2. If the exclusion condition is triggered, we can record it through metrics > and count the total number of exclusions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802029#comment-17802029 ] ASF GitHub Bot commented on HDFS-17309: --- LiuGuH commented on code in PR #6390: URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440125583 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java: ## @@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService { /** Service to maintain State Store caches. */ private StateStoreCacheUpdateService cacheUpdater; /** Time the cache was last successfully updated. */ - private long cacheLastUpdateTime; + private long cacheLastUpdateTime = 0; Review Comment: Good idea, make it initialized in the constructor. Thanks. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
[ https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802027#comment-17802027 ] ASF GitHub Bot commented on HDFS-17302: --- KeeProMise commented on PR #6380: URL: https://github.com/apache/hadoop/pull/6380#issuecomment-1874924983 @goiri @simbadzina hi, could you please help to review, thanks a lot! > RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. > --- > > Key: HDFS-17302 > URL: https://issues.apache.org/jira/browse/HDFS-17302 > Project: Hadoop HDFS > Issue Type: New Feature > Components: rbf >Reporter: Jian Zhang >Assignee: Jian Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch, > HDFS-17302.003.patch > > > h2. Current shortcomings > [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a > StaticRouterRpcFairnessPolicyController to support configuring different > handlers for different ns. Using the StaticRouterRpcFairnessPolicyController > allows the router to isolate different ns, and the ns with a higher load will > not affect the router's access to the ns with a normal load. But the > StaticRouterRpcFairnessPolicyController still falls short in many ways, such > as: > 1. *Configuration is inconvenient and error-prone*: When I use > StaticRouterRpcFairnessPolicyController, I first need to know how many > handlers the router has in total, then I have to know how many nameservices > the router currently has, and then carefully calculate how many handlers to > allocate to each ns so that the sum of handlers for all ns will not exceed > the total handlers of the router, and I also need to consider how many > handlers to allocate to each ns to achieve better performance. Therefore, I > need to be very careful when configuring. Even if I configure only one more > handler for a certain ns, the total number is more than the number of > handlers owned by the router, which will also cause the router to fail to > start. At this time, I had to investigate the reason why the router failed to > start. After finding the reason, I had to reconsider the number of handlers > for each ns. In addition, when I reconfigure the total number of handlers on > the router, I have to re-allocate handlers to each ns, which undoubtedly > increases the complexity of operation and maintenance. > 2. *Extension ns is not supported*: During the running of the router, if a > new ns is added to the cluster and a mount is added for the ns, but because > no handler is allocated for the ns, the ns cannot be accessed through the > router. We must reconfigure the number of handlers and then refresh the > configuration. At this time, the router can access the ns normally. When we > reconfigure the number of handlers, we have to face disadvantage 1: > Configuration is inconvenient and error-prone. > 3. *Waste handlers*: The main purpose of proposing > RouterRpcFairnessPolicyController is to enable the router to access ns with > normal load and not be affected by ns with higher load. First of all, not all > ns have high loads; secondly, ns with high loads do not have high loads 24 > hours a day. It may be that only certain time periods, such as 0 to 8 > o'clock, have high loads, and other time periods have normal loads. Assume > there are 2 ns, and each ns is allocated half of the number of handlers. > Assume that ns1 has many requests from 0 to 14 o'clock, and almost no > requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, > and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and > 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more > requests and the other ns has almost no requests, so we have wasted half of > the number of handlers. > 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController > does not support sharing, only isolation. I think isolation is just a means > to improve the performance of router access to normal ns, not the purpose. It > is impossible for all ns in the cluster to have high loads. On the contrary, > in most scenarios, only a few ns in the cluster have high loads, and the > loads of most other ns are normal. For ns with higher load and ns with normal > load, we need to isolate their handlers so that the ns with higher load will > not affect the performance of ns with lower load. However, for nameservices > that are also under normal load, or are under higher load, we do not need to > isolate them, these ns of the same nature can share the handlers of the > router; The performance is better than assigning a fixed number of handlers > to each ns, because each ns can use all
[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802023#comment-17802023 ] ASF GitHub Bot commented on HDFS-17309: --- slfan1989 commented on code in PR #6390: URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440105699 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java: ## @@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService { /** Service to maintain State Store caches. */ private StateStoreCacheUpdateService cacheUpdater; /** Time the cache was last successfully updated. */ - private long cacheLastUpdateTime; + private long cacheLastUpdateTime = 0; Review Comment: Thanks for the explanation! But the code looks weird because other variables are initialized in the constructor, can we initialize to `0` in the constructor? Just a personal opinion, let's wait goiri's view. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan
[ https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802022#comment-17802022 ] ASF GitHub Bot commented on HDFS-17310: --- slfan1989 commented on PR #6391: URL: https://github.com/apache/hadoop/pull/6391#issuecomment-1874893095 > Thanks @slfan1989 @ashutoshcipher help me reivew it. > Could you mind to push this modification forward when you have free time ? Thank you very much. @haiyang1987 Thanks for the contribution! LGTM. But do we wait 1-2 working days for tasanuma to help review the PR? cc:@ashutoshcipher > DiskBalancer: Enhance the log message for submitPlan > > > Key: HDFS-17310 > URL: https://issues.apache.org/jira/browse/HDFS-17310 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > In order to convenient troubleshoot problems, enhance the log message for > submitPlan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802011#comment-17802011 ] ASF GitHub Bot commented on HDFS-17306: --- hadoop-yetus commented on PR #6385: URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874843069 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 20s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 19s | | trunk passed | | +1 :green_heart: | compile | 0m 22s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 22s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 27s | | trunk passed | | +1 :green_heart: | javadoc | 0m 28s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 18s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 0m 50s | | trunk passed | | +1 :green_heart: | shadedclient | 19m 35s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 20s | | the patch passed | | +1 :green_heart: | compile | 0m 21s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 21s | | the patch passed | | +1 :green_heart: | compile | 0m 17s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 11s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 22s | | the patch passed | | +1 :green_heart: | javadoc | 0m 18s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 16s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 0m 50s | | the patch passed | | +1 :green_heart: | shadedclient | 19m 29s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 15s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 24s | | The patch does not generate ASF License warnings. | | | | 99m 47s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6385 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux da6d00d6c07f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / c0e9750ff4cc8d86495cfc85f67261d9b7e7d4e2 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/testReport/ | | Max. process+thread count | 2624 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > RBF:Router should not return nameservices that does not enable observer nodes >
[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801997#comment-17801997 ] ASF GitHub Bot commented on HDFS-17317: --- xuzifu666 commented on PR #6402: URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874801552 @ayushtkn Thanks for your review,could you help to merge it?CI seems hang > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > Labels: pull-request-available > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.
[ https://issues.apache.org/jira/browse/HDFS-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801996#comment-17801996 ] ASF GitHub Bot commented on HDFS-17311: --- LiuGuH commented on code in PR #6392: URL: https://github.com/apache/hadoop/pull/6392#discussion_r1440027525 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionManager.java: ## @@ -229,7 +229,7 @@ public ConnectionContext getConnection(UserGroupInformation ugi, // Add a new connection to the pool if it wasn't usable if (conn == null || !conn.isUsable()) { - if (!this.creatorQueue.offer(pool)) { + if (!this.creatorQueue.contains(pool) && !this.creatorQueue.offer(pool)) { Review Comment: Prevents duplicate pool from being added to the creatorQueue. getConnection() will be concurrent called, so createQueue will be added duplicate pool. > RBF: ConnectionManager creatorQueue should offer a pool that is not already > in creatorQueue. > > > Key: HDFS-17311 > URL: https://issues.apache.org/jira/browse/HDFS-17311 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > 2023-12-29 15:18:54,799 ERROR > org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add > more than 2048 connections at the same time > In my environment, ConnectionManager creatorQueue is full ,but the cluster > does not have so many users cloud reach up 2048 pair of in router. > In the case of a large number of concurrent creatorQueue add same pool more > than once. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801995#comment-17801995 ] ASF GitHub Bot commented on HDFS-17306: --- LiuGuH commented on PR #6385: URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874794589 > @LiuGuH Thanks for your contribution! We need to fix checkstyle. Thanks for review. Fixed > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801994#comment-17801994 ] ASF GitHub Bot commented on HDFS-17309: --- LiuGuH commented on code in PR #6390: URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440022147 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java: ## @@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService { /** Service to maintain State Store caches. */ private StateStoreCacheUpdateService cacheUpdater; /** Time the cache was last successfully updated. */ - private long cacheLastUpdateTime; + private long cacheLastUpdateTime = 0; Review Comment: Thanks for review. Yes, the default value of long is 0, but I think it is better to assign to 0 for emphasis on completing initialization. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan
[ https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801989#comment-17801989 ] ASF GitHub Bot commented on HDFS-17310: --- haiyang1987 commented on PR #6391: URL: https://github.com/apache/hadoop/pull/6391#issuecomment-1874777837 Thanks @slfan1989 @ashutoshcipher help me reivew it. Could you mind to push this modification forward when you have free time ? Thank you very much. > DiskBalancer: Enhance the log message for submitPlan > > > Key: HDFS-17310 > URL: https://issues.apache.org/jira/browse/HDFS-17310 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > In order to convenient troubleshoot problems, enhance the log message for > submitPlan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801979#comment-17801979 ] ASF GitHub Bot commented on HDFS-17290: --- li-leyang commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874756402 @simbadzina Please take look. The yetus waring is fixed. > HDFS: add client rpc backoff metrics due to disconnection from lowest > priority queue > > > Key: HDFS-17290 > URL: https://issues.apache.org/jira/browse/HDFS-17290 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.0, 3.4.0 >Reporter: Lei Yang >Assignee: Lei Yang >Priority: Major > Labels: pull-request-available > > Clients are backoff when rpcs cannot be enqueued. However there are different > scenarios when backoff could happen. Currently there is no way to > differenciate whether a backoff happened due to lowest prio+disconnection or > queue overflow from higher priority queues when connection between client and > namenode remains open. Currently IPC server just emits a single metrics for > all the backoffs. > Example: > # Client are directly enqueued into lowest priority queue and backoff when > lowest queue is full. Client are expected to disconnect from namenode. > # Client are enqueued into non-lowest priority queue and overflowed all the > way down to lowest priority queue and back off. In this case, connection > between client and namenode remains open. > We would like to add metrics for #1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17313) dfsadmin -reconfig option to start/query reconfig on all live namenodes.
[ https://issues.apache.org/jira/browse/HDFS-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801972#comment-17801972 ] ASF GitHub Bot commented on HDFS-17313: --- huangzhaobo99 commented on PR #6395: URL: https://github.com/apache/hadoop/pull/6395#issuecomment-1874739359 Hi @tomscut @virajjasani If you have time, please help me review the code. > dfsadmin -reconfig option to start/query reconfig on all live namenodes. > > > Key: HDFS-17313 > URL: https://issues.apache.org/jira/browse/HDFS-17313 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > > https://issues.apache.org/jira/browse/HDFS-16568 Support batch refreshing of > datanode configurations. > There are several nn in the HA or Federated Cluster, and this ticket > implements batch refreshing of nn configurations. > *Implementation method* > # Use the DFSUtil.getNNServiceRpcAddressesForCluster method to parse the > configuration and obtain the addresses of all nn's > # Using two worker threads, currently does not support configuring the > number of worker threads (will be implemented in other ticket if necessary) > *Sample outputs* > {code:java} > $ bin/hdfs dfsadmin -reconfig namenode livenodes start > Started reconfiguration task on node [localhost:50034]. > Started reconfiguration task on node [localhost:50036]. > Started reconfiguration task on node [localhost:50038]. > Started reconfiguration task on node [localhost:50040]. > Starting of reconfiguration task successful on 4 nodes, failed on 0 nodes. > $ bin/hdfs dfsadmin -reconfig namenode livenodes status > Reconfiguring status for node [localhost:50034] > SUCCESS: Changed property dfs.heartbeat.interval > From: "5" > To: "3" > Reconfiguring status for node [localhost:50036] > SUCCESS: Changed property dfs.heartbeat.interval > From: "5" > To: "3" > Reconfiguring status for node [localhost:50038] > SUCCESS: Changed property dfs.heartbeat.interval > From: "5" > To: "3" > Reconfiguring status for node [localhost:50040] > SUCCESS: Changed property dfs.heartbeat.interval > From: "5" > To: "3" > Retrieval of reconfiguration status successful on 4 nodes, failed on 0 > nodes.{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801970#comment-17801970 ] ASF GitHub Bot commented on HDFS-17290: --- hadoop-yetus commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874720576 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 58s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 47m 7s | | trunk passed | | +1 :green_heart: | compile | 18m 14s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 16m 27s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 19s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 38s | | trunk passed | | +1 :green_heart: | javadoc | 1m 13s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 38s | | trunk passed | | +1 :green_heart: | shadedclient | 39m 48s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 55s | | the patch passed | | +1 :green_heart: | compile | 17m 14s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 17m 14s | | the patch passed | | +1 :green_heart: | compile | 16m 23s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 16m 22s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 13s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 215 unchanged - 0 fixed = 217 total (was 215) | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 6s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 39s | | the patch passed | | +1 :green_heart: | shadedclient | 39m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 22s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 253m 0s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6359 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 4de20a61222e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/testReport/ | | Max. process+thread count | 3137 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U:
[jira] [Commented] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.
[ https://issues.apache.org/jira/browse/HDFS-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801962#comment-17801962 ] ASF GitHub Bot commented on HDFS-17311: --- slfan1989 commented on code in PR #6392: URL: https://github.com/apache/hadoop/pull/6392#discussion_r1439961113 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionManager.java: ## @@ -229,7 +229,7 @@ public ConnectionContext getConnection(UserGroupInformation ugi, // Add a new connection to the pool if it wasn't usable if (conn == null || !conn.isUsable()) { - if (!this.creatorQueue.offer(pool)) { + if (!this.creatorQueue.contains(pool) && !this.creatorQueue.offer(pool)) { Review Comment: Sorry, I don’t understand the meaning of this change. Can we explain the reason for this change? > RBF: ConnectionManager creatorQueue should offer a pool that is not already > in creatorQueue. > > > Key: HDFS-17311 > URL: https://issues.apache.org/jira/browse/HDFS-17311 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > 2023-12-29 15:18:54,799 ERROR > org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add > more than 2048 connections at the same time > In my environment, ConnectionManager creatorQueue is full ,but the cluster > does not have so many users cloud reach up 2048 pair of in router. > In the case of a large number of concurrent creatorQueue add same pool more > than once. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan reassigned HDFS-17306: - Assignee: liuguanghua > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Assignee: liuguanghua >Priority: Major > Labels: pull-request-available > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801955#comment-17801955 ] ASF GitHub Bot commented on HDFS-17306: --- slfan1989 commented on PR #6385: URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874674522 @LiuGuH Thanks for your contribution! We need to fix checkstyle. > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error
[ https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801954#comment-17801954 ] ASF GitHub Bot commented on HDFS-17309: --- slfan1989 commented on code in PR #6390: URL: https://github.com/apache/hadoop/pull/6390#discussion_r1439936960 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java: ## @@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService { /** Service to maintain State Store caches. */ private StateStoreCacheUpdateService cacheUpdater; /** Time the cache was last successfully updated. */ - private long cacheLastUpdateTime; + private long cacheLastUpdateTime = 0; Review Comment: Is this change necessary? The default value of long is `0`. > RBF: Fix Router Safemode check contidition error > > > Key: HDFS-17309 > URL: https://issues.apache.org/jira/browse/HDFS-17309 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > With HDFS-17116, Router safemode check contidition use monotonicNow(). > For code in RouterSafemodeService.periodicInvoke() > long now = monotonicNow(); > long cacheUpdateTime = stateStore.getCacheUpdateTime(); > boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval; > > Function monotonicNow() is implemented with System.nanoTime(). > System.nanoTime() in javadoc description: > This method can only be used to measure elapsed time and is not related to > any other notion of system or wall-clock time. The value returned represents > nanoseconds since some fixed but arbitrary origin time (perhaps in the > future, so values may be negative). > > The following situation maybe exists : > If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , > and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe be > true or false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801949#comment-17801949 ] ASF GitHub Bot commented on HDFS-17290: --- hadoop-yetus commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874656850 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 8m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 50s | | trunk passed | | +1 :green_heart: | compile | 9m 26s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 8m 59s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 40s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 51s | | trunk passed | | +1 :green_heart: | javadoc | 0m 40s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 26s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 34s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 16s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 9m 14s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 9m 14s | | the patch passed | | +1 :green_heart: | compile | 8m 52s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 8m 52s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 33s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 215 unchanged - 0 fixed = 217 total (was 215) | | +1 :green_heart: | mvnsite | 0m 49s | | the patch passed | | +1 :green_heart: | javadoc | 0m 32s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 25s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 37s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 22s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 51s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 147m 12s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6359 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux 84f1b153d50d 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/testReport/ | | Max. process+thread count | 3150 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U:
[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue
[ https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801948#comment-17801948 ] ASF GitHub Bot commented on HDFS-17290: --- hadoop-yetus commented on PR #6359: URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874656622 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 25s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | markdownlint | 0m 0s | | markdownlint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 31m 57s | | trunk passed | | +1 :green_heart: | compile | 9m 34s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 8m 58s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 52s | | trunk passed | | +1 :green_heart: | javadoc | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 27s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 35s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 9m 12s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 9m 12s | | the patch passed | | +1 :green_heart: | compile | 8m 50s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 8m 50s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 32s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 2 new + 215 unchanged - 0 fixed = 217 total (was 215) | | +1 :green_heart: | mvnsite | 0m 52s | | the patch passed | | +1 :green_heart: | javadoc | 0m 34s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 15m 36s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 138m 41s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6359 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint | | uname | Linux a4fd3515e5e9 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/testReport/ | | Max. process+thread count | 5218 (vs. ulimit of 5500) | | modules | C: hadoop-common-project/hadoop-common U:
[jira] [Commented] (HDFS-17316) Compatibility Benchmark over HCFS Implementations
[ https://issues.apache.org/jira/browse/HDFS-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801917#comment-17801917 ] Steve Loughran commented on HDFS-17316: --- I'd propose decoupling this from the core hadoop/ source tree so it can be built against 3.3 and bq. there is no formal suite to do compatibility assessment of a file system for all such HCFS implementations. Thus, whether the functionality is well accomplished and meets the core compatible expectations mainly relies on service provider's own report. # filesystem contract tests are designed to do this from junit; If your FS implementation doesn't subclass and run these, you need to start there. # filesystem API specification is intended to specify the API and document where problems surface. maintenance there always welcome -and as the contract tests are derived from it, enhancements in those tests to follow # there's also terasort to validate commit protocols # + distcp contract tests for its semantics # dfsio does a lot, but needs maintenance -it only targets the clusterfs, when really you should be able to point at cloud storage from your own computer. extending that to take a specific target fs would be good. # output must go into the class ant junit xml format so jenkins can present it. We can create a new hadoop git repo for this. Do you have existing code and any detailed specification/docs. this also allows you to add dependencies on other things, e.g. spark. > Compatibility Benchmark over HCFS Implementations > - > > Key: HDFS-17316 > URL: https://issues.apache.org/jira/browse/HDFS-17316 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Han Liu >Priority: Major > > {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in > big data storage ecosystem, providing unified interfaces and generally clear > semantics, and has become the de-factor standard for industry storage systems > to follow and conform with. There have been a series of HCFS implementations > in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for > Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object > Storage, and more from storage service's providers on their own. > {*}Problems:{*}However, as indicated by introduction.md, there is no formal > suite to do compatibility assessment of a file system for all such HCFS > implementations. Thus, whether the functionality is well accomplished and > meets the core compatible expectations mainly relies on service provider's > own report. Meanwhile, Hadoop is also developing and new features are > continuously contributing to HCFS interfaces for existing implementations to > follow and update, in which case, Hadoop also needs a tool to quickly assess > if these features are supported or not for a specific HCFS implementation. > Besides, the known hadoop command line tool or hdfs shell is used to directly > interact with a HCFS storage system, where most commands correspond to > specific HCFS interfaces and work well. Still, there are cases that are > complicated and may not work, like expunge command. To check such commands > for an HCFS, we also need an approach to figure them out. > {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility > benchmark and provide corresponding tool to do the compatibility assessment > for an HCFS storage system. The benchmark and tool should consider both HCFS > interfaces and hdfs shell commands. Different scenarios require different > kinds of compatibilities. For such consideration, we could define different > suites in the benchmark. > *Benefits:* We intend the benchmark and tool to be useful for both storage > providers and storage users. For end users, it can be used to evalute the > compatibility level and determine if the storage system in question is > suitable for the required scenarios. For storage providers, it helps to > quickly generate an objective and reliable report about core functioins of > the storage service. As an instance, if the HCFS got a 100% on a suite named > 'tpcds', it is demonstrated that all functions needed by a tpcds program have > been well achieved. It is also a guide indicating how storage service > abilities can map to HCFS interfaces, such as storage class on S3. > Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801889#comment-17801889 ] Rushabh Shah commented on HDFS-17299: - Thank you [~ayushtkn] > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) >
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801881#comment-17801881 ] Ayush Saxena commented on HDFS-17299: - Done!!! > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16
[jira] [Assigned] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reassigned HDFS-17299: --- Assignee: Ritesh > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Assignee: Ritesh >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,712 WARN
[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.
[ https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801867#comment-17801867 ] Rushabh Shah commented on HDFS-17299: - [~gargrite] is interested to work on this jira. [~ayushtkn] [~hexiaoqiao] Can one of you please add him to the Contributors list so that I can assign the Jira to him. Thank you! > HDFS is not rack failure tolerant while creating a new file. > > > Key: HDFS-17299 > URL: https://issues.apache.org/jira/browse/HDFS-17299 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.10.1 >Reporter: Rushabh Shah >Priority: Critical > Attachments: repro.patch > > > Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ. > Our configuration: > 1. We use 3 Availability Zones (AZs) for fault tolerance. > 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy. > 3. We use the following configuration parameters: > dfs.namenode.heartbeat.recheck-interval: 60 > dfs.heartbeat.interval: 3 > So it will take 123 ms (20.5mins) to detect that datanode is dead. > > Steps to reproduce: > # Bring down 1 AZ. > # HBase (HDFS client) tries to create a file (WAL file) and then calls > hflush on the newly created file. > # DataStreamer is not able to find blocks locations that satisfies the rack > placement policy (one copy in each rack which essentially means one copy in > each AZ) > # Since all the datanodes in that AZ are down but still alive to namenode, > the client gets different datanodes but still all of them are in the same AZ. > See logs below. > # HBase is not able to create a WAL file and it aborts the region server. > > Relevant logs from hdfs client and namenode > > {noformat} > 2023-12-16 17:17:43,818 INFO [on default port 9000] FSNamesystem.audit - > allowed=trueugi=hbase/ (auth:KERBEROS) ip= > cmd=create src=/hbase/WALs/ dst=null > 2023-12-16 17:17:43,978 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652565_140946716, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,061 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,061 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874--1594838129323:blk_1214652565_140946716 > 2023-12-16 17:17:44,179 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK] > 2023-12-16 17:17:44,339 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652580_140946764, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,369 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at > org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651) > at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715) > 2023-12-16 17:17:44,369 WARN [Thread-39087] hdfs.DataStreamer - Abandoning > BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764 > 2023-12-16 17:17:44,454 WARN [Thread-39087] hdfs.DataStreamer - Excluding > datanode > DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK] > 2023-12-16 17:17:44,522 INFO [on default port 9000] hdfs.StateChange - > BLOCK* allocate blk_1214652594_140946796, replicas=:50010, > :50010, :50010 for /hbase/WALs/ > 2023-12-16 17:17:44,712 INFO [Thread-39087] hdfs.DataStreamer - Exception in > createBlockOutputStream > java.io.IOException: Got error, status=ERROR, status message , ack with > firstBadLink as :50010 > at > org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113) > at > org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747) > at >
[jira] [Created] (HDFS-17319) Downgrade noisy InvalidToken log in ShortCircuitCache
Bryan Beaudreault created HDFS-17319: Summary: Downgrade noisy InvalidToken log in ShortCircuitCache Key: HDFS-17319 URL: https://issues.apache.org/jira/browse/HDFS-17319 Project: Hadoop HDFS Issue Type: Bug Reporter: Bryan Beaudreault ShortCircuitCache logs an exception whenever InvalidToken is detected (see below). As I understand it, this is part of normal operations when block tokens are enabled. So this log seems really noisy. I think we should downgrade it to DEBUG, or at least remove the stacktrace. It leads someone to thinking they have a problem, when they don't. {code:java} 2024-01-02T16:02:51,621 [hedgedRead-1545] INFO org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: ShortCircuitCache(0xbac84bc): could not load 1437522350_BP-1420092181-ip-1658432093559 due to InvalidToken exception. org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting to set up short-circuit access to /hbase/data/default/hbase-table-1/23f85f2d91e4967ce389d1a09c43e46d/0/609ccd5d7fcb4830a6602ddaea5ed27e at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:651) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1180) ~[hadoop-hdfs-client-3.3.1.jar:?] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801858#comment-17801858 ] ASF GitHub Bot commented on HDFS-17315: --- hadoop-yetus commented on PR #6400: URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1874275707 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 35s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 39s | | trunk passed | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 22s | | trunk passed | | +1 :green_heart: | javadoc | 1m 5s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 36s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | -1 :x: | spotbugs | 3m 18s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html) | hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings. | | +1 :green_heart: | shadedclient | 35m 3s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 7s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 56s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 12s | | the patch passed | | +1 :green_heart: | javadoc | 0m 52s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 27s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 14s | | hadoop-hdfs-project/hadoop-hdfs generated 0 new + 0 unchanged - 1 fixed = 0 total (was 1) | | +1 :green_heart: | shadedclient | 34m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 217m 11s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 43s | | The patch does not generate ASF License warnings. | | | | 351m 48s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6400 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c1770dcc0068 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 303095018bd08b43a2043a12e62bde76961d354d | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801832#comment-17801832 ] ASF GitHub Bot commented on HDFS-17306: --- hadoop-yetus commented on PR #6385: URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874212168 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 18m 16s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 46m 17s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 0m 34s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 29s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 41s | | trunk passed | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 30s | | trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 23s | | trunk passed | | +1 :green_heart: | shadedclient | 40m 55s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 35s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 30s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 30s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 18s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 2 unchanged - 0 fixed = 3 total (was 2) | | +1 :green_heart: | mvnsite | 0m 33s | | the patch passed | | +1 :green_heart: | javadoc | 0m 29s | | the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 0m 24s | | the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 28s | | the patch passed | | +1 :green_heart: | shadedclient | 38m 29s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 23m 9s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 35s | | The patch does not generate ASF License warnings. | | | | 181m 33s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/6385 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 7135c90f0234 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / b9fc917aa851c6479366be82d5b5cefcc11c4699 | | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/testReport/ | | Max. process+thread count | 2406 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output |
[jira] [Assigned] (HDFS-17318) RBF: MountTableResolver#locationCache supports multi policies
[ https://issues.apache.org/jira/browse/HDFS-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba reassigned HDFS-17318: Assignee: farmmamba > RBF: MountTableResolver#locationCache supports multi policies > - > > Key: HDFS-17318 > URL: https://issues.apache.org/jira/browse/HDFS-17318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17318) MountTableResolver#locationCache supports multi policies
farmmamba created HDFS-17318: Summary: MountTableResolver#locationCache supports multi policies Key: HDFS-17318 URL: https://issues.apache.org/jira/browse/HDFS-17318 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: farmmamba -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17318) RBF: MountTableResolver#locationCache supports multi policies
[ https://issues.apache.org/jira/browse/HDFS-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba updated HDFS-17318: - Summary: RBF: MountTableResolver#locationCache supports multi policies (was: MountTableResolver#locationCache supports multi policies) > RBF: MountTableResolver#locationCache supports multi policies > - > > Key: HDFS-17318 > URL: https://issues.apache.org/jira/browse/HDFS-17318 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: farmmamba >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks
[ https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801791#comment-17801791 ] ASF GitHub Bot commented on HDFS-16420: --- LoseYSelf commented on PR #3880: URL: https://github.com/apache/hadoop/pull/3880#issuecomment-1874050010 > hello, @Jackson-Wang-7 Does this fix adapt to Hadoop 3.1 version? No > Avoid deleting unique data blocks when deleting redundancy striped blocks > - > > Key: HDFS-16420 > URL: https://issues.apache.org/jira/browse/HDFS-16420 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Reporter: qinyuren >Assignee: Jackson Wang >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: image-2022-01-10-17-31-35-910.png, > image-2022-01-10-17-32-56-981.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have a similar problem as HDFS-16297 described. > In our cluster, we used {color:#de350b}ec(6+3) + balancer with version > 3.1.0{color}, and the {color:#de350b}missing block{color} happened. > We got the block(blk_-9223372036824119008) info from fsck, only 5 live > replications and multiple redundant replications. > {code:java} > blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5 > blk_-9223372036824119007:DatanodeInfoWithStorage, > blk_-9223372036824119002:DatanodeInfoWithStorage, > blk_-9223372036824119001:DatanodeInfoWithStorage, > blk_-9223372036824119000:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage {code} > > We searched the log from all datanode, and found that the internal blocks of > blk_-9223372036824119008 were deleted almost at the same time. > > {code:java} > 08:15:58,550 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI > file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008 > 08:16:21,214 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI > file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006 > 08:16:55,737 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI > file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005 > {code} > > The total number of internal blocks deleted during 08:15-08:17 are as follows > ||internal block||index|| delete num|| > |blk_-9223372036824119008 > blk_-9223372036824119006 > blk_-9223372036824119005 > blk_-9223372036824119004 > blk_-9223372036824119003 > blk_-9223372036824119000 |0 > 2 > 3 > 4 > 5 > 8| 1 > 1 > 1 > 50 > 1 > 1| > > {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered > full block report immediately.{color} > > There are 2 questions: > 1. Why are there so many replicas of this block? > 2. Why delete the internal block with only one copy? > The reasons for the first problem may be as follows: > 1. We set the full block report period of some datanode to 168 hours. > 2. We have done a namenode HA operation. > 3. After namenode HA, the state of storage became > {color:#ff}stale{color}, and the state not change until next full block > report. > 4. The balancer copied the replica without deleting the replica from source > node, because the source node have the stale storage, and the request was put > into {color:#ff}postponedMisreplicatedBlocks{color}. > 5. Balancer continues to copy the replica, eventually resulting in multiple > copies of a replica > !image-2022-01-10-17-31-35-910.png|width=642,height=269! > The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many > block to remove. > !image-2022-01-10-17-32-56-981.png|width=745,height=124! > As for the second question, we checked the code of > {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any > problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto
[ https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801776#comment-17801776 ] ASF GitHub Bot commented on HDFS-17306: --- LiuGuH commented on code in PR #6385: URL: https://github.com/apache/hadoop/pull/6385#discussion_r1439428230 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterStateIdContext.java: ## @@ -85,7 +85,11 @@ public void setResponseHeaderState(RpcResponseHeaderProto.Builder headerBuilder) return; } RouterFederatedStateProto.Builder builder = RouterFederatedStateProto.newBuilder(); -namespaceIdMap.forEach((k, v) -> builder.putNamespaceStateIds(k, v.get())); +namespaceIdMap.forEach((k, v) -> { Review Comment: Done, Thanks > RBF:Router should not return nameservices that does not enable observer nodes > in RpcResponseHeaderProto > --- > > Key: HDFS-17306 > URL: https://issues.apache.org/jira/browse/HDFS-17306 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > If a cluster has 3 nameservices: ns1, ns2,ns3, and ns1 has observer > nodes, and client via DFSRouter comminutes with nns. > If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable, the client will > receive all nameservices in RpcResponseHeaderProto. > We should reduce rpc response size if nameservices don't enable > observer nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks
[ https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801738#comment-17801738 ] ASF GitHub Bot commented on HDFS-16420: --- echomyecho commented on PR #3880: URL: https://github.com/apache/hadoop/pull/3880#issuecomment-1873890061 hello, @Jackson-Wang-7 Does this fix adapt to Hadoop 3.1 version? > Avoid deleting unique data blocks when deleting redundancy striped blocks > - > > Key: HDFS-16420 > URL: https://issues.apache.org/jira/browse/HDFS-16420 > Project: Hadoop HDFS > Issue Type: Bug > Components: ec, erasure-coding >Reporter: qinyuren >Assignee: Jackson Wang >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.2.3, 3.3.2 > > Attachments: image-2022-01-10-17-31-35-910.png, > image-2022-01-10-17-32-56-981.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > We have a similar problem as HDFS-16297 described. > In our cluster, we used {color:#de350b}ec(6+3) + balancer with version > 3.1.0{color}, and the {color:#de350b}missing block{color} happened. > We got the block(blk_-9223372036824119008) info from fsck, only 5 live > replications and multiple redundant replications. > {code:java} > blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5 > blk_-9223372036824119007:DatanodeInfoWithStorage, > blk_-9223372036824119002:DatanodeInfoWithStorage, > blk_-9223372036824119001:DatanodeInfoWithStorage, > blk_-9223372036824119000:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage, > blk_-9223372036824119004:DatanodeInfoWithStorage {code} > > We searched the log from all datanode, and found that the internal blocks of > blk_-9223372036824119008 were deleted almost at the same time. > > {code:java} > 08:15:58,550 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI > file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008 > 08:16:21,214 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI > file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006 > 08:16:55,737 INFO impl.FsDatasetAsyncDiskService > (FsDatasetAsyncDiskService.java:run(333)) - Deleted > BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI > file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005 > {code} > > The total number of internal blocks deleted during 08:15-08:17 are as follows > ||internal block||index|| delete num|| > |blk_-9223372036824119008 > blk_-9223372036824119006 > blk_-9223372036824119005 > blk_-9223372036824119004 > blk_-9223372036824119003 > blk_-9223372036824119000 |0 > 2 > 3 > 4 > 5 > 8| 1 > 1 > 1 > 50 > 1 > 1| > > {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered > full block report immediately.{color} > > There are 2 questions: > 1. Why are there so many replicas of this block? > 2. Why delete the internal block with only one copy? > The reasons for the first problem may be as follows: > 1. We set the full block report period of some datanode to 168 hours. > 2. We have done a namenode HA operation. > 3. After namenode HA, the state of storage became > {color:#ff}stale{color}, and the state not change until next full block > report. > 4. The balancer copied the replica without deleting the replica from source > node, because the source node have the stale storage, and the request was put > into {color:#ff}postponedMisreplicatedBlocks{color}. > 5. Balancer continues to copy the replica, eventually resulting in multiple > copies of a replica > !image-2022-01-10-17-31-35-910.png|width=642,height=269! > The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many > block to remove. > !image-2022-01-10-17-32-56-981.png|width=745,height=124! > As for the second question, we checked the code of > {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any > problem. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801724#comment-17801724 ] ASF GitHub Bot commented on HDFS-17315: --- hadoop-yetus commented on PR #6400: URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1873827133 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | -1 :x: | mvninstall | 52m 1s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | -1 :x: | compile | 0m 28s | [/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | compile | 0m 29s | [/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-hdfs in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -0 :warning: | checkstyle | 0m 27s | [/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | The patch fails to run checkstyle in hadoop-hdfs | | -1 :x: | mvnsite | 0m 29s | [/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in trunk failed. | | -1 :x: | javadoc | 0m 28s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in trunk failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javadoc | 0m 29s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt) | hadoop-hdfs in trunk failed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08. | | -1 :x: | spotbugs | 4m 7s | [/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html) | hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings. | | -1 :x: | shadedclient | 10m 11s | | branch has errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | -1 :x: | mvninstall | 0m 22s | [/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch failed. | | -1 :x: | compile | 0m 22s | [/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt) | hadoop-hdfs in the patch failed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04. | | -1 :x: | javac | 0m 22s |
[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801715#comment-17801715 ] ASF GitHub Bot commented on HDFS-17317: --- xuzifu666 commented on PR #6402: URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1873798203 @ayushtkn PTAL for the minor fix > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > Labels: pull-request-available > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-17317: -- Labels: pull-request-available (was: ) > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > Labels: pull-request-available > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close
[ https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801714#comment-17801714 ] ASF GitHub Bot commented on HDFS-17317: --- xuzifu666 opened a new pull request, #6402: URL: https://github.com/apache/hadoop/pull/6402 ### Description of PR DebugAdmin metaOut not need multiple close ### How was this patch tested? not need ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > DebugAdmin metaOut not need multiple close > --- > > Key: HDFS-17317 > URL: https://issues.apache.org/jira/browse/HDFS-17317 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: xy >Priority: Major > > DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17317) DebugAdmin metaOut not need multiple close
xy created HDFS-17317: - Summary: DebugAdmin metaOut not need multiple close Key: HDFS-17317 URL: https://issues.apache.org/jira/browse/HDFS-17317 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: xy DebugAdmin metaOut not need multiple close -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.
[ https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801701#comment-17801701 ] ASF GitHub Bot commented on HDFS-17315: --- huangzhaobo99 commented on PR #6400: URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1873747947 This warning is confusing for me, try passing it directly into FsImage to resolve this warning. https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html > Optimize the namenode format code logic. > > > Key: HDFS-17315 > URL: https://issues.apache.org/jira/browse/HDFS-17315 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Major > Labels: pull-request-available > > 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have > been deleted in, but there is still one line of invalid code that has not > been deleted. > 2. Additionally, optimize resource closure logic and use 'try-with-resources' > processing. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org