[jira] [Commented] (HDFS-17314) Add a metrics to record congestion backoff counts.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802039#comment-17802039
 ] 

ASF GitHub Bot commented on HDFS-17314:
---

hfutatzhanghb commented on PR #6398:
URL: https://github.com/apache/hadoop/pull/6398#issuecomment-1874945130

   Updated with unit test.




> Add a metrics to record congestion backoff counts.
> --
>
> Key: HDFS-17314
> URL: https://issues.apache.org/jira/browse/HDFS-17314
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>  Labels: pull-request-available
>
> When we enable congestion backoff, we should better know how many times 
> datanodes have told client backoff.  This metrics can help us know better 
> about congestion function.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802034#comment-17802034
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874933861

   > I have triggered a new build: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6402/2/
   > 
   > we need to get a green build before we can merge
   
   OK,Thanks




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802033#comment-17802033
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

ayushtkn commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874932858

   I have triggered a new build:
   https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6402/2/
   
   we need to get a green build before we can merge




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17305) Add avoid datanode reason count related metrics to namenode.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802030#comment-17802030
 ] 

ASF GitHub Bot commented on HDFS-17305:
---

huangzhaobo99 commented on PR #6393:
URL: https://github.com/apache/hadoop/pull/6393#issuecomment-1874927909

   Hi @tasanuma @ayushtkn,
   Please kindly review this PR as well if you have bandwidth. thanks.




> Add avoid datanode reason count related metrics to namenode.
> 
>
> Key: HDFS-17305
> URL: https://issues.apache.org/jira/browse/HDFS-17305
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>
> Now, there are slownode and load avoidance functions, mainly implemented in 
> the  BlockPlacementPolicyDefault class.
> 1. After triggering the exclusion condition, some logs will be printed on nn, 
> which can be used to troubleshoot anomalies in nn by checking the logs, the 
> code is as follows:
> {code:java}
> ...
> if (!node.isInService()) {
>   logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
>   return false;
> }
> if (avoidStaleNodes) {
>   if (node.isStale(this.staleInterval)) {
> logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
> return false;
>   }
> }
> ...{code}
> 2. If the exclusion condition is triggered, we can record it through metrics 
> and count the total number of exclusions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802029#comment-17802029
 ] 

ASF GitHub Bot commented on HDFS-17309:
---

LiuGuH commented on code in PR #6390:
URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440125583


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java:
##
@@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService {
   /** Service to maintain State Store caches. */
   private StateStoreCacheUpdateService cacheUpdater;
   /** Time the cache was last successfully updated. */
-  private long cacheLastUpdateTime;
+  private long cacheLastUpdateTime = 0;

Review Comment:
   Good idea, make it initialized in the constructor. Thanks.





> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists :
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17302) RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802027#comment-17802027
 ] 

ASF GitHub Bot commented on HDFS-17302:
---

KeeProMise commented on PR #6380:
URL: https://github.com/apache/hadoop/pull/6380#issuecomment-1874924983

   @goiri @simbadzina hi, could you please help to review, thanks a lot!




> RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation.
> ---
>
> Key: HDFS-17302
> URL: https://issues.apache.org/jira/browse/HDFS-17302
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: rbf
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-17302.001.patch, HDFS-17302.002.patch, 
> HDFS-17302.003.patch
>
>
> h2. Current shortcomings
> [HDFS-14090|https://issues.apache.org/jira/browse/HDFS-14090] provides a 
> StaticRouterRpcFairnessPolicyController to support configuring different 
> handlers for different ns. Using the StaticRouterRpcFairnessPolicyController 
> allows the router to isolate different ns, and the ns with a higher load will 
> not affect the router's access to the ns with a normal load. But the 
> StaticRouterRpcFairnessPolicyController still falls short in many ways, such 
> as:
> 1. *Configuration is inconvenient and error-prone*: When I use 
> StaticRouterRpcFairnessPolicyController, I first need to know how many 
> handlers the router has in total, then I have to know how many nameservices 
> the router currently has, and then carefully calculate how many handlers to 
> allocate to each ns so that the sum of handlers for all ns will not exceed 
> the total handlers of the router, and I also need to consider how many 
> handlers to allocate to each ns to achieve better performance. Therefore, I 
> need to be very careful when configuring. Even if I configure only one more 
> handler for a certain ns, the total number is more than the number of 
> handlers owned by the router, which will also cause the router to fail to 
> start. At this time, I had to investigate the reason why the router failed to 
> start. After finding the reason, I had to reconsider the number of handlers 
> for each ns. In addition, when I reconfigure the total number of handlers on 
> the router, I have to re-allocate handlers to each ns, which undoubtedly 
> increases the complexity of operation and maintenance.
> 2. *Extension ns is not supported*: During the running of the router, if a 
> new ns is added to the cluster and a mount is added for the ns, but because 
> no handler is allocated for the ns, the ns cannot be accessed through the 
> router. We must reconfigure the number of handlers and then refresh the 
> configuration. At this time, the router can access the ns normally. When we 
> reconfigure the number of handlers, we have to face disadvantage 1: 
> Configuration is inconvenient and error-prone.
> 3. *Waste handlers*:  The main purpose of proposing 
> RouterRpcFairnessPolicyController is to enable the router to access ns with 
> normal load and not be affected by ns with higher load. First of all, not all 
> ns have high loads; secondly, ns with high loads do not have high loads 24 
> hours a day. It may be that only certain time periods, such as 0 to 8 
> o'clock, have high loads, and other time periods have normal loads. Assume 
> there are 2 ns, and each ns is allocated half of the number of handlers. 
> Assume that ns1 has many requests from 0 to 14 o'clock, and almost no 
> requests from 14 to 24 o'clock, ns2 has many requests from 12 to 24 o'clock, 
> and almost no requests from 0 to 14 o'clock; when it is between 0 o'clock and 
> 12 o'clock and between 14 o'clock and 24 o'clock, only one ns has more 
> requests and the other ns has almost no requests, so we have wasted half of 
> the number of handlers.
> 4. *Only isolation, no sharing*: The staticRouterRpcFairnessPolicyController 
> does not support sharing, only isolation. I think isolation is just a means 
> to improve the performance of router access to normal ns, not the purpose. It 
> is impossible for all ns in the cluster to have high loads. On the contrary, 
> in most scenarios, only a few ns in the cluster have high loads, and the 
> loads of most other ns are normal. For ns with higher load and ns with normal 
> load, we need to isolate their handlers so that the ns with higher load will 
> not affect the performance of ns with lower load. However, for nameservices 
> that are also under normal load, or are under higher load, we do not need to 
> isolate them, these ns of the same nature can share the handlers of the 
> router; The performance is better than assigning a fixed number of handlers 
> to each ns, because each ns can use all 

[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802023#comment-17802023
 ] 

ASF GitHub Bot commented on HDFS-17309:
---

slfan1989 commented on code in PR #6390:
URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440105699


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java:
##
@@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService {
   /** Service to maintain State Store caches. */
   private StateStoreCacheUpdateService cacheUpdater;
   /** Time the cache was last successfully updated. */
-  private long cacheLastUpdateTime;
+  private long cacheLastUpdateTime = 0;

Review Comment:
   Thanks for the explanation! But the code looks weird because other variables 
are initialized in the constructor, can we initialize to `0` in the 
constructor? Just a personal opinion, let's wait goiri's view.





> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists :
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802022#comment-17802022
 ] 

ASF GitHub Bot commented on HDFS-17310:
---

slfan1989 commented on PR #6391:
URL: https://github.com/apache/hadoop/pull/6391#issuecomment-1874893095

   > Thanks @slfan1989 @ashutoshcipher help me reivew it.
   > Could you mind to push this modification forward when you have free time ? 
Thank you very much.
   
   @haiyang1987  Thanks for the contribution! LGTM. 
   
   But do we wait 1-2 working days for tasanuma to help review the PR?
   
   cc:@ashutoshcipher




> DiskBalancer: Enhance the log message for submitPlan
> 
>
> Key: HDFS-17310
> URL: https://issues.apache.org/jira/browse/HDFS-17310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> In order to convenient troubleshoot problems, enhance the log message for 
> submitPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802011#comment-17802011
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

hadoop-yetus commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874843069

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 20s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 19s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 22s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 22s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 27s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 28s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  19m 35s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 21s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 17s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 11s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 22s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  19m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 15s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 24s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  99m 47s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6385 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux da6d00d6c07f 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c0e9750ff4cc8d86495cfc85f67261d9b7e7d4e2 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/testReport/ |
   | Max. process+thread count | 2624 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/3/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF:Router should not return nameservices that does not enable observer nodes 
> 

[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801997#comment-17801997
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1874801552

   @ayushtkn Thanks for your review,could you help to merge it?CI seems hang




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801996#comment-17801996
 ] 

ASF GitHub Bot commented on HDFS-17311:
---

LiuGuH commented on code in PR #6392:
URL: https://github.com/apache/hadoop/pull/6392#discussion_r1440027525


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionManager.java:
##
@@ -229,7 +229,7 @@ public ConnectionContext getConnection(UserGroupInformation 
ugi,
 
 // Add a new connection to the pool if it wasn't usable
 if (conn == null || !conn.isUsable()) {
-  if (!this.creatorQueue.offer(pool)) {
+  if (!this.creatorQueue.contains(pool) && !this.creatorQueue.offer(pool)) 
{

Review Comment:
   Prevents duplicate pool from being added to the creatorQueue.   
getConnection() will be concurrent called, so createQueue will be added 
duplicate pool.





> RBF: ConnectionManager creatorQueue should offer a pool that is not already 
> in creatorQueue.
> 
>
> Key: HDFS-17311
> URL: https://issues.apache.org/jira/browse/HDFS-17311
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> 2023-12-29 15:18:54,799 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add 
> more than 2048 connections at the same time
> In my environment, ConnectionManager creatorQueue is full ,but the cluster 
> does not have so many users cloud reach up  2048 pair of  in router.
> In the case of a large number of concurrent  creatorQueue add same pool more 
> than once.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801995#comment-17801995
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

LiuGuH commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874794589

   > @LiuGuH Thanks for your contribution! We need to fix checkstyle.
   
   Thanks for review. Fixed 




> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801994#comment-17801994
 ] 

ASF GitHub Bot commented on HDFS-17309:
---

LiuGuH commented on code in PR #6390:
URL: https://github.com/apache/hadoop/pull/6390#discussion_r1440022147


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java:
##
@@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService {
   /** Service to maintain State Store caches. */
   private StateStoreCacheUpdateService cacheUpdater;
   /** Time the cache was last successfully updated. */
-  private long cacheLastUpdateTime;
+  private long cacheLastUpdateTime = 0;

Review Comment:
   Thanks for review.
   
   Yes, the default value of long is 0, but  I think it is better to assign to 
0 for emphasis on completing initialization.





> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists :
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17310) DiskBalancer: Enhance the log message for submitPlan

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801989#comment-17801989
 ] 

ASF GitHub Bot commented on HDFS-17310:
---

haiyang1987 commented on PR #6391:
URL: https://github.com/apache/hadoop/pull/6391#issuecomment-1874777837

   Thanks @slfan1989 @ashutoshcipher help me reivew it. 
   Could you mind to push this modification forward when you have free time ? 
Thank you very much.




> DiskBalancer: Enhance the log message for submitPlan
> 
>
> Key: HDFS-17310
> URL: https://issues.apache.org/jira/browse/HDFS-17310
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> In order to convenient troubleshoot problems, enhance the log message for 
> submitPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801979#comment-17801979
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

li-leyang commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874756402

   @simbadzina Please take look. The yetus waring is fixed.




> HDFS: add client rpc backoff metrics due to disconnection from lowest 
> priority queue
> 
>
> Key: HDFS-17290
> URL: https://issues.apache.org/jira/browse/HDFS-17290
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0, 3.4.0
>Reporter: Lei Yang
>Assignee: Lei Yang
>Priority: Major
>  Labels: pull-request-available
>
> Clients are backoff when rpcs cannot be enqueued. However there are different 
> scenarios when backoff could happen. Currently there is no way to 
> differenciate whether a backoff happened due to lowest prio+disconnection or 
> queue overflow from higher priority queues when connection between client and 
> namenode remains open. Currently IPC server just emits a single metrics for 
> all the backoffs.
> Example:
>  # Client are directly enqueued into lowest priority queue and backoff when 
> lowest queue is full. Client are expected to disconnect from namenode.
>  # Client are enqueued into non-lowest priority queue and overflowed all the 
> way down to lowest priority queue and back off. In this case, connection 
> between client and namenode remains open.
> We would like to add metrics for #1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17313) dfsadmin -reconfig option to start/query reconfig on all live namenodes.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801972#comment-17801972
 ] 

ASF GitHub Bot commented on HDFS-17313:
---

huangzhaobo99 commented on PR #6395:
URL: https://github.com/apache/hadoop/pull/6395#issuecomment-1874739359

   Hi @tomscut @virajjasani If you have time, please help me review the code.




> dfsadmin -reconfig option to start/query reconfig on all live namenodes.
> 
>
> Key: HDFS-17313
> URL: https://issues.apache.org/jira/browse/HDFS-17313
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/HDFS-16568 Support batch refreshing of 
> datanode configurations.
> There are several nn in the HA or Federated Cluster, and this ticket 
> implements batch refreshing of nn configurations.
> *Implementation method*
>  # Use the DFSUtil.getNNServiceRpcAddressesForCluster method to parse the 
> configuration and obtain the addresses of all nn's
>  # Using two worker threads, currently does not support configuring the 
> number of worker threads (will be implemented in other ticket if necessary)
> *Sample outputs*
> {code:java}
> $ bin/hdfs dfsadmin -reconfig namenode livenodes start
> Started reconfiguration task on node [localhost:50034].
> Started reconfiguration task on node [localhost:50036].
> Started reconfiguration task on node [localhost:50038].
> Started reconfiguration task on node [localhost:50040]. 
> Starting of reconfiguration task successful on 4 nodes, failed on 0 nodes.
> $ bin/hdfs dfsadmin -reconfig namenode livenodes status
> Reconfiguring status for node [localhost:50034]
> SUCCESS: Changed property dfs.heartbeat.interval
>   From: "5"
>   To: "3"
> Reconfiguring status for node [localhost:50036]
> SUCCESS: Changed property dfs.heartbeat.interval
>   From: "5"
>   To: "3"
> Reconfiguring status for node [localhost:50038]
> SUCCESS: Changed property dfs.heartbeat.interval
>   From: "5"
>   To: "3"
> Reconfiguring status for node [localhost:50040]
> SUCCESS: Changed property dfs.heartbeat.interval
>   From: "5"
>   To: "3"
> Retrieval of reconfiguration status successful on 4 nodes, failed on 0 
> nodes.{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801970#comment-17801970
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

hadoop-yetus commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874720576

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 58s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  47m  7s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  18m 14s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  16m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m 19s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 13s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 38s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  39m 48s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 55s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  17m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  17m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  16m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  16m 22s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 13s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 215 
unchanged - 0 fixed = 217 total (was 215)  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 39s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  39m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 22s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 253m  0s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6359 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 4de20a61222e 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/11/testReport/ |
   | Max. process+thread count | 3137 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 

[jira] [Commented] (HDFS-17311) RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801962#comment-17801962
 ] 

ASF GitHub Bot commented on HDFS-17311:
---

slfan1989 commented on code in PR #6392:
URL: https://github.com/apache/hadoop/pull/6392#discussion_r1439961113


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/ConnectionManager.java:
##
@@ -229,7 +229,7 @@ public ConnectionContext getConnection(UserGroupInformation 
ugi,
 
 // Add a new connection to the pool if it wasn't usable
 if (conn == null || !conn.isUsable()) {
-  if (!this.creatorQueue.offer(pool)) {
+  if (!this.creatorQueue.contains(pool) && !this.creatorQueue.offer(pool)) 
{

Review Comment:
   Sorry, I don’t understand the meaning of this change. Can we explain the 
reason for this change?





> RBF: ConnectionManager creatorQueue should offer a pool that is not already 
> in creatorQueue.
> 
>
> Key: HDFS-17311
> URL: https://issues.apache.org/jira/browse/HDFS-17311
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> 2023-12-29 15:18:54,799 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.ConnectionManager: Cannot add 
> more than 2048 connections at the same time
> In my environment, ConnectionManager creatorQueue is full ,but the cluster 
> does not have so many users cloud reach up  2048 pair of  in router.
> In the case of a large number of concurrent  creatorQueue add same pool more 
> than once.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HDFS-17306:
-

Assignee: liuguanghua

> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801955#comment-17801955
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

slfan1989 commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874674522

   @LiuGuH Thanks for your contribution! We need to fix checkstyle.




> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17309) RBF: Fix Router Safemode check contidition error

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801954#comment-17801954
 ] 

ASF GitHub Bot commented on HDFS-17309:
---

slfan1989 commented on code in PR #6390:
URL: https://github.com/apache/hadoop/pull/6390#discussion_r1439936960


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/StateStoreService.java:
##
@@ -116,7 +116,7 @@ public class StateStoreService extends CompositeService {
   /** Service to maintain State Store caches. */
   private StateStoreCacheUpdateService cacheUpdater;
   /** Time the cache was last successfully updated. */
-  private long cacheLastUpdateTime;
+  private long cacheLastUpdateTime = 0;

Review Comment:
   Is this change necessary? The default value of long is `0`.





> RBF: Fix Router Safemode check contidition error
> 
>
> Key: HDFS-17309
> URL: https://issues.apache.org/jira/browse/HDFS-17309
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
> With HDFS-17116, Router safemode check contidition use monotonicNow(). 
> For code in  RouterSafemodeService.periodicInvoke()
> long now = monotonicNow();
> long cacheUpdateTime = stateStore.getCacheUpdateTime();
> boolean isCacheStale = (now - cacheUpdateTime) > this.staleInterval;
>  
> Function monotonicNow() is implemented with System.nanoTime(). 
> System.nanoTime() in javadoc description:
> This method can only be used to measure elapsed time and is not related to 
> any other notion of system or wall-clock time. The value returned represents 
> nanoseconds since some fixed but arbitrary origin time (perhaps in the 
> future, so values may be negative). 
>  
> The following situation maybe exists :
> If refreshCaches not success in the beginning, cacheUpdateTime will be 0 , 
> and now - cacheUpdateTime is arbitrary origin time,so isCacheStale maybe  be 
> true or false. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801949#comment-17801949
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

hadoop-yetus commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874656850

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   8m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 26s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   8m 59s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 16s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 14s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   9m 14s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 52s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   8m 52s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 33s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 215 
unchanged - 0 fixed = 217 total (was 215)  |
   | +1 :green_heart: |  mvnsite  |   0m 49s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 37s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 22s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  15m 51s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 147m 12s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6359 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux 84f1b153d50d 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/12/testReport/ |
   | Max. process+thread count | 3150 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 

[jira] [Commented] (HDFS-17290) HDFS: add client rpc backoff metrics due to disconnection from lowest priority queue

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801948#comment-17801948
 ] 

ASF GitHub Bot commented on HDFS-17290:
---

hadoop-yetus commented on PR #6359:
URL: https://github.com/apache/hadoop/pull/6359#issuecomment-1874656622

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 25s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  31m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   9m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   8m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 38s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 35s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 23s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   9m 12s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   9m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   8m 50s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   8m 50s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 32s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 2 new + 215 
unchanged - 0 fixed = 217 total (was 215)  |
   | +1 :green_heart: |  mvnsite  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  15m 36s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 138m 41s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6359 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint 
|
   | uname | Linux a4fd3515e5e9 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4e12859a9591dbe9623119b57b1c5f472f3ab0af |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6359/13/testReport/ |
   | Max. process+thread count | 5218 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 

[jira] [Commented] (HDFS-17316) Compatibility Benchmark over HCFS Implementations

2024-01-02 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801917#comment-17801917
 ] 

Steve Loughran commented on HDFS-17316:
---

I'd propose decoupling this from the core hadoop/ source tree so it can be 
built against 3.3 and 

bq. there is no formal suite to do compatibility assessment of a file system 
for all such HCFS implementations. Thus, whether the functionality is well 
accomplished and meets the core compatible expectations mainly relies on 
service provider's own report. 

# filesystem contract tests are designed to do this from junit;  If your FS 
implementation doesn't subclass and run these, you need to start there.
# filesystem API specification is intended to specify the API and document 
where problems surface. maintenance there always welcome -and as the contract 
tests are derived from it, enhancements in those tests to follow
# there's also terasort to validate commit protocols
# + distcp contract tests for its semantics
# dfsio does a lot, but needs maintenance -it only targets the clusterfs, when 
really you should be able to point at cloud storage from your own computer. 
extending that to take a specific target fs would be good.
# output must go into the class ant junit xml format so jenkins can present it.

We can create a new hadoop git repo for this. Do you have existing code and any 
detailed specification/docs. this also allows you to add dependencies on other 
things, e.g. spark.




> Compatibility Benchmark over HCFS Implementations
> -
>
> Key: HDFS-17316
> URL: https://issues.apache.org/jira/browse/HDFS-17316
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Han Liu
>Priority: Major
>
> {*}Background:{*}Hadoop-Compatible File System (HCFS) is a core conception in 
> big data storage ecosystem, providing unified interfaces and generally clear 
> semantics, and has become the de-factor standard for industry storage systems 
> to follow and conform with. There have been a series of HCFS implementations 
> in Hadoop, such as S3AFileSystem for Amazon's S3 Object Store, WASB for 
> Microsoft's Azure Blob Storage and OSS connector for Alibaba Cloud Object 
> Storage, and more from storage service's providers on their own.
> {*}Problems:{*}However, as indicated by introduction.md, there is no formal 
> suite to do compatibility assessment of a file system for all such HCFS 
> implementations. Thus, whether the functionality is well accomplished and 
> meets the core compatible expectations mainly relies on service provider's 
> own report. Meanwhile, Hadoop is also developing and new features are 
> continuously contributing to HCFS interfaces for existing implementations to 
> follow and update, in which case, Hadoop also needs a tool to quickly assess 
> if these features are supported or not for a specific HCFS implementation. 
> Besides, the known hadoop command line tool or hdfs shell is used to directly 
> interact with a HCFS storage system, where most commands correspond to 
> specific HCFS interfaces and work well. Still, there are cases that are 
> complicated and may not work, like expunge command. To check such commands 
> for an HCFS, we also need an approach to figure them out.
> {*}Proposal:{*}Accordingly, we propose to define a formal HCFS compatibility 
> benchmark and provide corresponding tool to do the compatibility assessment 
> for an HCFS storage system. The benchmark and tool should consider both HCFS 
> interfaces and hdfs shell commands. Different scenarios require different 
> kinds of compatibilities. For such consideration, we could define different 
> suites in the benchmark.
> *Benefits:* We intend the benchmark and tool to be useful for both storage 
> providers and storage users. For end users, it can be used to evalute the 
> compatibility level and determine if the storage system in question is 
> suitable for the required scenarios. For storage providers, it helps to 
> quickly generate an objective and reliable report about core functioins of 
> the storage service. As an instance, if the HCFS got a 100% on a suite named 
> 'tpcds', it is demonstrated that all functions needed by a tpcds program have 
> been well achieved. It is also a guide indicating how storage service 
> abilities can map to HCFS interfaces, such as storage class on S3.
> Any thoughts? Comments and feedback are mostly welcomed. Thanks in advance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801889#comment-17801889
 ] 

Rushabh Shah commented on HDFS-17299:
-

Thank you [~ayushtkn] 

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801881#comment-17801881
 ] 

Ayush Saxena commented on HDFS-17299:
-

Done!!!

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 

[jira] [Assigned] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17299:
---

Assignee: Ritesh

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,712 WARN  

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Rushabh Shah (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801867#comment-17801867
 ] 

Rushabh Shah commented on HDFS-17299:
-

[~gargrite] is interested to work on this jira. [~ayushtkn]  [~hexiaoqiao] Can 
one of you please add him to the Contributors list so that I can assign the 
Jira to him. Thank you!

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> 

[jira] [Created] (HDFS-17319) Downgrade noisy InvalidToken log in ShortCircuitCache

2024-01-02 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HDFS-17319:


 Summary: Downgrade noisy InvalidToken log in ShortCircuitCache
 Key: HDFS-17319
 URL: https://issues.apache.org/jira/browse/HDFS-17319
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Bryan Beaudreault


ShortCircuitCache logs an exception whenever InvalidToken is detected (see 
below). As I understand it, this is part of normal operations when block tokens 
are enabled. So this log seems really noisy. I think we should downgrade it to 
DEBUG, or at least remove the stacktrace. It leads someone to thinking they 
have a problem, when they don't.
{code:java}
2024-01-02T16:02:51,621 [hedgedRead-1545] INFO 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0xbac84bc): could not load 
1437522350_BP-1420092181-ip-1658432093559 due to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control 
error while attempting to set up short-circuit access to 
/hbase/data/default/hbase-table-1/23f85f2d91e4967ce389d1a09c43e46d/0/609ccd5d7fcb4830a6602ddaea5ed27e
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:651)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
 ~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) 
~[hadoop-hdfs-client-3.3.1.jar:?]
        at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1180)
 ~[hadoop-hdfs-client-3.3.1.jar:?] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801858#comment-17801858
 ] 

ASF GitHub Bot commented on HDFS-17315:
---

hadoop-yetus commented on PR #6400:
URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1874275707

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 35s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 39s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | -1 :x: |  spotbugs  |   3m 18s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings.  |
   | +1 :green_heart: |  shadedclient  |  35m  3s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 56s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 14s |  |  
hadoop-hdfs-project/hadoop-hdfs generated 0 new + 0 unchanged - 1 fixed = 0 
total (was 1)  |
   | +1 :green_heart: |  shadedclient  |  34m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 217m 11s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 43s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 351m 48s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6400 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c1770dcc0068 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 303095018bd08b43a2043a12e62bde76961d354d |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801832#comment-17801832
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

hadoop-yetus commented on PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#issuecomment-1874212168

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  18m 16s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  46m 17s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 29s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  40m 55s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 35s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 18s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-rbf: The patch generated 1 new + 2 
unchanged - 0 fixed = 3 total (was 2)  |
   | +1 :green_heart: |  mvnsite  |   0m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   0m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_392-8u392-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  38m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  23m  9s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 35s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 181m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/6385 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 7135c90f0234 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 
15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / b9fc917aa851c6479366be82d5b5cefcc11c4699 |
   | Default Java | Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6385/2/testReport/ |
   | Max. process+thread count | 2406 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 

[jira] [Assigned] (HDFS-17318) RBF: MountTableResolver#locationCache supports multi policies

2024-01-02 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba reassigned HDFS-17318:


Assignee: farmmamba

> RBF: MountTableResolver#locationCache supports multi policies
> -
>
> Key: HDFS-17318
> URL: https://issues.apache.org/jira/browse/HDFS-17318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17318) MountTableResolver#locationCache supports multi policies

2024-01-02 Thread farmmamba (Jira)
farmmamba created HDFS-17318:


 Summary: MountTableResolver#locationCache supports multi policies
 Key: HDFS-17318
 URL: https://issues.apache.org/jira/browse/HDFS-17318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: farmmamba






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17318) RBF: MountTableResolver#locationCache supports multi policies

2024-01-02 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba updated HDFS-17318:
-
Summary: RBF: MountTableResolver#locationCache supports multi policies  
(was: MountTableResolver#locationCache supports multi policies)

> RBF: MountTableResolver#locationCache supports multi policies
> -
>
> Key: HDFS-17318
> URL: https://issues.apache.org/jira/browse/HDFS-17318
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: farmmamba
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801791#comment-17801791
 ] 

ASF GitHub Bot commented on HDFS-16420:
---

LoseYSelf commented on PR #3880:
URL: https://github.com/apache/hadoop/pull/3880#issuecomment-1874050010

   > hello, @Jackson-Wang-7 Does this fix adapt to Hadoop 3.1 version?
   
   No




> Avoid deleting unique data blocks when deleting redundancy striped blocks
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: qinyuren
>Assignee: Jackson Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HDFS-17306) RBF:Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801776#comment-17801776
 ] 

ASF GitHub Bot commented on HDFS-17306:
---

LiuGuH commented on code in PR #6385:
URL: https://github.com/apache/hadoop/pull/6385#discussion_r1439428230


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterStateIdContext.java:
##
@@ -85,7 +85,11 @@ public void 
setResponseHeaderState(RpcResponseHeaderProto.Builder headerBuilder)
   return;
 }
 RouterFederatedStateProto.Builder builder = 
RouterFederatedStateProto.newBuilder();
-namespaceIdMap.forEach((k, v) -> builder.putNamespaceStateIds(k, v.get()));
+namespaceIdMap.forEach((k, v) -> {

Review Comment:
   Done,  Thanks





> RBF:Router should not return nameservices that does not enable observer nodes 
> in RpcResponseHeaderProto
> ---
>
> Key: HDFS-17306
> URL: https://issues.apache.org/jira/browse/HDFS-17306
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Major
>  Labels: pull-request-available
>
>       If a cluster has 3 nameservices: ns1, ns2,ns3, and  ns1 has observer 
> nodes, and client via DFSRouter comminutes with nns.
>       If DFS_ROUTER_OBSERVER_READ_DEFAULT_KEY enable,  the client will 
> receive all nameservices in RpcResponseHeaderProto. 
>        We should reduce rpc response size if nameservices don't enable 
> observer nodes.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16420) Avoid deleting unique data blocks when deleting redundancy striped blocks

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801738#comment-17801738
 ] 

ASF GitHub Bot commented on HDFS-16420:
---

echomyecho commented on PR #3880:
URL: https://github.com/apache/hadoop/pull/3880#issuecomment-1873890061

   hello, @Jackson-Wang-7   Does this fix adapt to Hadoop 3.1 version?




> Avoid deleting unique data blocks when deleting redundancy striped blocks
> -
>
> Key: HDFS-16420
> URL: https://issues.apache.org/jira/browse/HDFS-16420
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec, erasure-coding
>Reporter: qinyuren
>Assignee: Jackson Wang
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.3, 3.3.2
>
> Attachments: image-2022-01-10-17-31-35-910.png, 
> image-2022-01-10-17-32-56-981.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We have a similar problem as HDFS-16297 described. 
> In our cluster, we used {color:#de350b}ec(6+3) + balancer with version 
> 3.1.0{color}, and the {color:#de350b}missing block{color} happened. 
> We got the block(blk_-9223372036824119008) info from fsck, only 5 live 
> replications and multiple redundant replications. 
> {code:java}
> blk_-9223372036824119008_220037616 len=133370338 MISSING! Live_repl=5
> blk_-9223372036824119007:DatanodeInfoWithStorage,   
> blk_-9223372036824119002:DatanodeInfoWithStorage,    
> blk_-9223372036824119001:DatanodeInfoWithStorage,  
> blk_-9223372036824119000:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage,  
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage, 
> blk_-9223372036824119004:DatanodeInfoWithStorage {code}
>    
> We searched the log from all datanode, and found that the internal blocks of 
> blk_-9223372036824119008 were deleted almost at the same time.
>  
> {code:java}
> 08:15:58,550 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119008_220037616 URI 
> file:/data15/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119008
> 08:16:21,214 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119006_220037616 URI 
> file:/data4/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119006
> 08:16:55,737 INFO  impl.FsDatasetAsyncDiskService 
> (FsDatasetAsyncDiskService.java:run(333)) - Deleted 
> BP-1606066499--1606188026755 blk_-9223372036824119005_220037616 URI 
> file:/data2/hadoop/hdfs/data/current/BP-1606066499--1606188026755/current/finalized/subdir19/subdir9/blk_-9223372036824119005
> {code}
>  
> The total number of internal blocks deleted during 08:15-08:17 are as follows
> ||internal block||index||    delete num||
> |blk_-9223372036824119008      
> blk_-9223372036824119006         
> blk_-9223372036824119005         
> blk_-9223372036824119004         
> blk_-9223372036824119003         
> blk_-9223372036824119000        |0
> 2
> 3
> 4
> 5
> 8|        1
>         1
>         1  
>         50
>         1
>         1|
>  
> {color:#ff}During 08:15 to 08:17, we restarted 2 datanode and triggered 
> full block report immediately.{color}
>  
> There are 2 questions: 
> 1. Why are there so many replicas of this block?
> 2. Why delete the internal block with only one copy?
> The reasons for the first problem may be as follows: 
> 1. We set the full block report period of some datanode to 168 hours.
> 2. We have done a namenode HA operation.
> 3. After namenode HA, the state of storage became 
> {color:#ff}stale{color}, and the state not change until next full block 
> report.
> 4. The balancer copied the replica without deleting the replica from source 
> node, because the source node have the stale storage, and the request was put 
> into {color:#ff}postponedMisreplicatedBlocks{color}.
> 5. Balancer continues to copy the replica, eventually resulting in multiple 
> copies of a replica
> !image-2022-01-10-17-31-35-910.png|width=642,height=269!
> The set of {color:#ff}rescannedMisreplicatedBlocks{color} have so many 
> block to remove.
> !image-2022-01-10-17-32-56-981.png|width=745,height=124!
> As for the second question, we checked the code of 
> {color:#de350b}processExtraRedundancyBlock{color}, but didn't find any 
> problem.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801724#comment-17801724
 ] 

ASF GitHub Bot commented on HDFS-17315:
---

hadoop-yetus commented on PR #6400:
URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1873827133

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | -1 :x: |  mvninstall  |  52m  1s | 
[/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-mvninstall-root.txt)
 |  root in trunk failed.  |
   | -1 :x: |  compile  |   0m 28s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  compile  |   0m 29s | 
[/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -0 :warning: |  checkstyle  |   0m 27s | 
[/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/buildtool-branch-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  The patch fails to run checkstyle in hadoop-hdfs  |
   | -1 :x: |  mvnsite  |   0m 29s | 
[/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in trunk failed.  |
   | -1 :x: |  javadoc  |   0m 28s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in trunk failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javadoc  |   0m 29s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_392-8u392-ga-1~20.04-b08.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-1.8.0_392-8u392-ga-1~20.04-b08.  |
   | -1 :x: |  spotbugs  |   4m  7s | 
[/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html)
 |  hadoop-hdfs-project/hadoop-hdfs in trunk has 1 extant spotbugs warnings.  |
   | -1 :x: |  shadedclient  |  10m 11s |  |  branch has errors when building 
and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | -1 :x: |  mvninstall  |   0m 22s | 
[/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch failed.  |
   | -1 :x: |  compile  |   0m 22s | 
[/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/2/artifact/out/patch-compile-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.txt)
 |  hadoop-hdfs in the patch failed with JDK 
Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04.  |
   | -1 :x: |  javac  |   0m 22s | 

[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801715#comment-17801715
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 commented on PR #6402:
URL: https://github.com/apache/hadoop/pull/6402#issuecomment-1873798203

   @ayushtkn PTAL for the minor fix




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17317:
--
Labels: pull-request-available  (was: )

> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801714#comment-17801714
 ] 

ASF GitHub Bot commented on HDFS-17317:
---

xuzifu666 opened a new pull request, #6402:
URL: https://github.com/apache/hadoop/pull/6402

   
   
   ### Description of PR
   DebugAdmin metaOut not need multiple close
   
   
   ### How was this patch tested?
   not need
   
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Priority: Major
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-02 Thread xy (Jira)
xy created HDFS-17317:
-

 Summary: DebugAdmin metaOut not  need multiple close
 Key: HDFS-17317
 URL: https://issues.apache.org/jira/browse/HDFS-17317
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: xy


DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17315) Optimize the namenode format code logic.

2024-01-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801701#comment-17801701
 ] 

ASF GitHub Bot commented on HDFS-17315:
---

huangzhaobo99 commented on PR #6400:
URL: https://github.com/apache/hadoop/pull/6400#issuecomment-1873747947

   This warning is confusing for me, try passing it directly into FsImage to 
resolve this warning.
   
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6400/1/artifact/out/branch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-warnings.html




> Optimize the namenode format code logic.
> 
>
> Key: HDFS-17315
> URL: https://issues.apache.org/jira/browse/HDFS-17315
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>
> 1. https://issues.apache.org/jira/browse/HDFS-17277 Some invalid codes have 
> been deleted in, but there is still one line of invalid code that has not 
> been deleted.
> 2. Additionally, optimize resource closure logic and use 'try-with-resources' 
> processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org