[jira] [Commented] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.

2023-08-08 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752284#comment-17752284
 ] 

Xiaoqiao He commented on HDFS-17149:


Hi [~zhanghaobo], Please check if HDFS-15079 can solve this issue.

> getBlockLocations RPC should use actual client ip to compute network distance 
> when using RBF.
> -
>
> Key: HDFS-17149
> URL: https://issues.apache.org/jira/browse/HDFS-17149
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> Please correct me if i understand wrongly. Thanks.
> Currently, when a getBlockLocations RPC forwards to namenode via router.  
> NameNode will use router ip address as client machine to compute network 
> distance against block's locations. See FSNamesystem#sortLocatedBlocksMore 
> method for more detailed information.  
> I think this compute method is not correct and should use actual client ip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752282#comment-17752282
 ] 

ASF GitHub Bot commented on HDFS-17030:
---

xinglin commented on PR #5878:
URL: https://github.com/apache/hadoop/pull/5878#issuecomment-1670751349

   > > Hi @goiri,
   > > could you take a look at this backport PR for branch-3.3 as well? thanks,
   > 
   > You'd have to put a separate PR together I'd say.
   
   I am confused: this is a separate PR, right?




> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752281#comment-17752281
 ] 

ASF GitHub Bot commented on HDFS-17030:
---

xinglin commented on code in PR #5878:
URL: https://github.com/apache/hadoop/pull/5878#discussion_r1288010220


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java:
##
@@ -285,13 +323,67 @@ private synchronized NNProxyInfo 
changeProxy(NNProxyInfo initial) {
 }
 currentIndex = (currentIndex + 1) % nameNodeProxies.size();
 currentProxy = createProxyIfNeeded(nameNodeProxies.get(currentIndex));
-currentProxy.setCachedState(getHAServiceState(currentProxy));
+currentProxy.setCachedState(getHAServiceStateWithTimeout(currentProxy));
 LOG.debug("Changed current proxy from {} to {}",
 initial == null ? "none" : initial.proxyInfo,
 currentProxy.proxyInfo);
 return currentProxy;
   }
 
+  /**
+   * Execute getHAServiceState() call with a timeout, to avoid a long wait when
+   * an NN becomes irresponsive to rpc requests
+   * (when a thread/heap dump is being taken, e.g.).
+   *
+   * For each getHAServiceState() call, a task is created and submitted to a
+   * threadpool for execution. We will wait for a response up to
+   * namenodeHAStateProbeTimeoutSec and cancel these requests if they time out.
+   *
+   * The implementation is split into two functions so that we can unit test
+   * the second function.
+   */
+  HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo) {
+Callable getHAServiceStateTask = () -> 
getHAServiceState(proxyInfo);
+
+try {
+  Future task =
+  nnProbingThreadPool.submit(getHAServiceStateTask);

Review Comment:
   fixed. fits in one line with 100 characters. So, did not bother splitting 
into two lines.





> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752280#comment-17752280
 ] 

ASF GitHub Bot commented on HDFS-17030:
---

xinglin commented on code in PR #5878:
URL: https://github.com/apache/hadoop/pull/5878#discussion_r1288009483


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java:
##
@@ -285,13 +323,67 @@ private synchronized NNProxyInfo 
changeProxy(NNProxyInfo initial) {
 }
 currentIndex = (currentIndex + 1) % nameNodeProxies.size();
 currentProxy = createProxyIfNeeded(nameNodeProxies.get(currentIndex));
-currentProxy.setCachedState(getHAServiceState(currentProxy));
+currentProxy.setCachedState(getHAServiceStateWithTimeout(currentProxy));
 LOG.debug("Changed current proxy from {} to {}",
 initial == null ? "none" : initial.proxyInfo,
 currentProxy.proxyInfo);
 return currentProxy;
   }
 
+  /**
+   * Execute getHAServiceState() call with a timeout, to avoid a long wait when
+   * an NN becomes irresponsive to rpc requests
+   * (when a thread/heap dump is being taken, e.g.).
+   *
+   * For each getHAServiceState() call, a task is created and submitted to a
+   * threadpool for execution. We will wait for a response up to
+   * namenodeHAStateProbeTimeoutSec and cancel these requests if they time out.
+   *
+   * The implementation is split into two functions so that we can unit test
+   * the second function.
+   */
+  HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo) {
+Callable getHAServiceStateTask = () -> 
getHAServiceState(proxyInfo);
+
+try {
+  Future task =
+  nnProbingThreadPool.submit(getHAServiceStateTask);
+  return getHAServiceStateWithTimeout(proxyInfo, task);
+} catch (RejectedExecutionException e) {
+  LOG.warn("Run out of threads to submit the request to query HA state. "
+  + "Ok to return null and we will fallback to use active NN to serve "
+  + "this request.");
+  return null;
+}
+  }
+
+  HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo,
+  Future task) {
+HAServiceState state = null;
+try {
+  if (namenodeHAStateProbeTimeoutMs > 0) {
+state = task.get(namenodeHAStateProbeTimeoutMs, TimeUnit.MILLISECONDS);
+  } else {
+// Disable timeout by waiting indefinitely when 
namenodeHAStateProbeTimeoutSec is set to 0
+// or a negative value.
+state = task.get();
+  }
+  LOG.debug("HA State for {} is {}", proxyInfo.proxyInfo, state);
+} catch (TimeoutException e) {
+  // Cancel the task on timeout
+  String msg = String.format("Cancel NN probe task due to timeout for %s", 
proxyInfo.proxyInfo);
+  LOG.warn(msg, e);
+  if (task != null) {

Review Comment:
   removed.





> Limit wait time for getHAServiceState in ObserverReaderProxy
> 
>
> Key: HDFS-17030
> URL: https://issues.apache.org/jira/browse/HDFS-17030
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: Xing Lin
>Assignee: Xing Lin
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When namenode HA is enabled and a standby NN is not responsible, we have 
> observed it would take a long time to serve a request, even though we have a 
> healthy observer or active NN. 
> Basically, when a standby is down, the RPC client would (re)try to create 
> socket connection to that standby for _ipc.client.connect.timeout_ _* 
> ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a 
> heap dump at a standby, the NN still accepts the socket connection but it 
> won't send responses to these RPC requests and we would timeout after 
> _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters 
> at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a 
> request takes more than 2 mins to complete when we take a heap dump at a 
> standby. This has been causing user job failures. 
> We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending 
> getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we 
> still use the original value from the config). However, that would double the 
> socket connection between clients and the NN (which is a deal-breaker). 
> The proposal is to add a timeout on getHAServiceState() calls in 
> ObserverReaderProxy and we will only wait for the timeout for an NN to 
> respond its HA state. Once we pass that timeout, we will move on to probe the 
> next NN. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

--

[jira] [Assigned] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.

2023-08-08 Thread farmmamba (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba reassigned HDFS-17149:


Assignee: farmmamba

> getBlockLocations RPC should use actual client ip to compute network distance 
> when using RBF.
> -
>
> Key: HDFS-17149
> URL: https://issues.apache.org/jira/browse/HDFS-17149
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Major
>
> Please correct me if i understand wrongly. Thanks.
> Currently, when a getBlockLocations RPC forwards to namenode via router.  
> NameNode will use router ip address as client machine to compute network 
> distance against block's locations. See FSNamesystem#sortLocatedBlocksMore 
> method for more detailed information.  
> I think this compute method is not correct and should use actual client ip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.

2023-08-08 Thread farmmamba (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752243#comment-17752243
 ] 

farmmamba commented on HDFS-17149:
--

[~hexiaoqiao]  [~ayushsaxena] [~tomscut] [~zhangshuyan]  Sir, sorry for 
disturbing you here.  Please have a look at this issue when you have free time 
and please correct me if i understand wrongly. Thanks all.

> getBlockLocations RPC should use actual client ip to compute network distance 
> when using RBF.
> -
>
> Key: HDFS-17149
> URL: https://issues.apache.org/jira/browse/HDFS-17149
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Priority: Major
>
> Please correct me if i understand wrongly. Thanks.
> Currently, when a getBlockLocations RPC forwards to namenode via router.  
> NameNode will use router ip address as client machine to compute network 
> distance against block's locations. See FSNamesystem#sortLocatedBlocksMore 
> method for more detailed information.  
> I think this compute method is not correct and should use actual client ip.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.

2023-08-08 Thread farmmamba (Jira)
farmmamba created HDFS-17149:


 Summary: getBlockLocations RPC should use actual client ip to 
compute network distance when using RBF.
 Key: HDFS-17149
 URL: https://issues.apache.org/jira/browse/HDFS-17149
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namanode
Affects Versions: 3.4.0
Reporter: farmmamba


Please correct me if i understand wrongly. Thanks.

Currently, when a getBlockLocations RPC forwards to namenode via router.  
NameNode will use router ip address as client machine to compute network 
distance against block's locations. See FSNamesystem#sortLocatedBlocksMore 
method for more detailed information.  

I think this compute method is not correct and should use actual client ip.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL

2023-08-08 Thread Hector Sandoval Chaverri (Jira)
Hector Sandoval Chaverri created HDFS-17148:
---

 Summary: RBF: SQLDelegationTokenSecretManager must cleanup expired 
tokens in SQL
 Key: HDFS-17148
 URL: https://issues.apache.org/jira/browse/HDFS-17148
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: Hector Sandoval Chaverri


The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them 
temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in 
AbstractDelegationTokenSecretManager runs periodically to cleanup any expired 
tokens from the cache, but most tokens have been evicted automatically per the 
TTL configuration. This leads to many expired tokens in the SQL database that 
should be cleaned up.

The SQLDelegationTokenSecretManager should find expired tokens in SQL instead 
of in the memory cache when running the periodic cleanup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16977) Forbid assigned characters in pathname.

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752125#comment-17752125
 ] 

ASF GitHub Bot commented on HDFS-16977:
---

hadoop-yetus commented on PR #5547:
URL: https://github.com/apache/hadoop/pull/5547#issuecomment-1669950199

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m  3s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  32m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   5m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m  0s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   5m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  36m  9s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 26s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   5m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   5m 18s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   5m 18s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 18s | 
[/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt)
 |  hadoop-hdfs-project: The patch generated 3 new + 395 unchanged - 0 fixed = 
398 total (was 395)  |
   | +1 :green_heart: |  mvnsite  |   2m 10s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   2m 11s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   5m 46s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  35m 55s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   2m 29s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 221m 18s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 56s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 397m  0s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   |   | hadoop.hdfs.server.datanode.TestDirectoryScanner |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5547 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux aa50f62265b1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 
13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin

[jira] [Commented] (HDFS-17093) In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752071#comment-17752071
 ] 

ASF GitHub Bot commented on HDFS-17093:
---

hadoop-yetus commented on PR #5855:
URL: https://github.com/apache/hadoop/pull/5855#issuecomment-1669741496

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 30s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m  7s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 49s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  checkstyle  |   0m 47s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 55s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m  2s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 46s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 42s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/blanks-eol.txt)
 |  The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  spotbugs  |   1m 53s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 21s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 202m 35s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 40s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 297m 10s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5855 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux d0dcaa24cb96 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5af06d98849707bed42863172dc38247aba428c8 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/testReport/ |
   | Max

[jira] [Commented] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752003#comment-17752003
 ] 

ASF GitHub Bot commented on HDFS-17137:
---

haiyang1987 commented on PR #5913:
URL: https://github.com/apache/hadoop/pull/5913#issuecomment-1669411025

   Thanks sir @Hexiaoqiao @tomscut help me review and merge !!!




> Standby/Observer NameNode skip to handle redundant replica block logic when 
> set decrease replication. 
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16977) Forbid assigned characters in pathname.

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751984#comment-17751984
 ] 

ASF GitHub Bot commented on HDFS-16977:
---

YuanbenWang closed pull request #5547: HDFS-16977. Forbid assigned characters 
in pathname.
URL: https://github.com/apache/hadoop/pull/5547




> Forbid assigned characters in pathname.
> ---
>
> Key: HDFS-16977
> URL: https://issues.apache.org/jira/browse/HDFS-16977
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: dfsclient, namenode
>Affects Versions: 3.3.4
>Reporter: WangYuanben
>Priority: Minor
>  Labels: pull-request-available
> Attachments: HDFS-16977__Forbid_assigned_characters_in_pathname_.patch
>
>
> Some pathnames which contains special character(s) may lead to unexpected 
> results. For example, there is a file named "/foo/file*" in my cluster, 
> created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I 
> want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I 
> remove all the files with the prefix of "/foo/file*" unexpectedly. There are 
> also some other characters just like '*', such as ' ', '|', '&', etc.
>  
> Therefore, it's necessary to restrict the occurrence of these characters in 
> pathname. A simple but effective way is to forbid assigned characters in 
> pathname when new file or directory is created.
>  
> It is also important to add the same function on the Router model and WebHdfs 
> model. I will add them as two subtasks later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)


[ https://issues.apache.org/jira/browse/HDFS-17147 ]


Yuanbo Liu deleted comment on HDFS-17147:
---

was (Author: yuanbo):
cc [~inigoiri]  [~zhangshuyan] [~ayushsaxena] ,

> RBF: RouterRpcServer getListing become extremely slow when the children of 
> the dir are mounted in the same ns.
> --
>
> Key: HDFS-17147
> URL: https://issues.apache.org/jira/browse/HDFS-17147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Suppose we mount table as below:
>  
> {code:java}
> /dir -> ns0 ->  /target/dir
> /dir/child1 -> ns0 -> /target/dir/child1
> /dir/child2 -> ns0 -> /target/dir/child2
> ..
> /dir/child200 -> ns0 -> /target/dir/child200
> {code}
>  
>  
> when listing /dir with RBF, it's getting extremely slow as getListing has two 
> parts:
> 1. list all children of  /target/dir
> 2. append the rest 200 mount points to the result.
>  
> The second part invoke getFileInfo concurrently to make sure mount points are 
> accessed under rightful permission. But in this case, the first part includes 
> the result of the second part, and there is no need to append second part 
> repeatly.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751980#comment-17751980
 ] 

Yuanbo Liu commented on HDFS-17147:
---

cc [~inigoiri]  [~zhangshuyan] [~ayushsaxena] ,

> RBF: RouterRpcServer getListing become extremely slow when the children of 
> the dir are mounted in the same ns.
> --
>
> Key: HDFS-17147
> URL: https://issues.apache.org/jira/browse/HDFS-17147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Suppose we mount table as below:
>  
> {code:java}
> /dir -> ns0 ->  /target/dir
> /dir/child1 -> ns0 -> /target/dir/child1
> /dir/child2 -> ns0 -> /target/dir/child2
> ..
> /dir/child200 -> ns0 -> /target/dir/child200
> {code}
>  
>  
> when listing /dir with RBF, it's getting extremely slow as getListing has two 
> parts:
> 1. list all children of  /target/dir
> 2. append the rest 200 mount points to the result.
>  
> The second part invoke getFileInfo concurrently to make sure mount points are 
> accessed under rightful permission. But in this case, the first part includes 
> the result of the second part, and there is no need to append second part 
> repeatly.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751979#comment-17751979
 ] 

Yuanbo Liu commented on HDFS-17147:
---

cc [~inigoiri]  [~zhangshuyan] [~ayushsaxena] ,

> RBF: RouterRpcServer getListing become extremely slow when the children of 
> the dir are mounted in the same ns.
> --
>
> Key: HDFS-17147
> URL: https://issues.apache.org/jira/browse/HDFS-17147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Suppose we mount table as below:
>  
> {code:java}
> /dir -> ns0 ->  /target/dir
> /dir/child1 -> ns0 -> /target/dir/child1
> /dir/child2 -> ns0 -> /target/dir/child2
> ..
> /dir/child200 -> ns0 -> /target/dir/child200
> {code}
>  
>  
> when listing /dir with RBF, it's getting extremely slow as getListing has two 
> parts:
> 1. list all children of  /target/dir
> 2. append the rest 200 mount points to the result.
>  
> The second part invoke getFileInfo concurrently to make sure mount points are 
> accessed under rightful permission. But in this case, the first part includes 
> the result of the second part, and there is no need to append second part 
> repeatly.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-17147:
--
Component/s: rbf

> RBF: RouterRpcServer getListing become extremely slow when the children of 
> the dir are mounted in the same ns.
> --
>
> Key: HDFS-17147
> URL: https://issues.apache.org/jira/browse/HDFS-17147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Suppose we mount table as below:
> /dir -> ns0 ->  /target/dir
> /dir/child1 -> ns0 -> /target/dir/child1
> /dir/child2 -> ns0 -> /target/dir/child2
> ..
> /dir/child200 -> ns0 -> /target/dir/child200
>  
> when listing /dir with RBF, it's getting extremely slow as getListing has two 
> parts:
> 1. list all children of  /target/dir
> 2. append the rest 200 mount points to the result.
>  
> The second part invoke getFileInfo concurrently to make sure mount points are 
> accessed under rightful permission. But in this case, the first part includes 
> the result of the second part, and there is no need to append second part 
> repeatly.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanbo Liu updated HDFS-17147:
--
Description: 
Suppose we mount table as below:

 
{code:java}
/dir -> ns0 ->  /target/dir
/dir/child1 -> ns0 -> /target/dir/child1
/dir/child2 -> ns0 -> /target/dir/child2
..
/dir/child200 -> ns0 -> /target/dir/child200
{code}
 

 

when listing /dir with RBF, it's getting extremely slow as getListing has two 
parts:
1. list all children of  /target/dir

2. append the rest 200 mount points to the result.

 

The second part invoke getFileInfo concurrently to make sure mount points are 
accessed under rightful permission. But in this case, the first part includes 
the result of the second part, and there is no need to append second part 
repeatly.

 

 

 

 

 

 

  was:
Suppose we mount table as below:

/dir -> ns0 ->  /target/dir

/dir/child1 -> ns0 -> /target/dir/child1

/dir/child2 -> ns0 -> /target/dir/child2

..

/dir/child200 -> ns0 -> /target/dir/child200

 

when listing /dir with RBF, it's getting extremely slow as getListing has two 
parts:
1. list all children of  /target/dir

2. append the rest 200 mount points to the result.

 

The second part invoke getFileInfo concurrently to make sure mount points are 
accessed under rightful permission. But in this case, the first part includes 
the result of the second part, and there is no need to append second part 
repeatly.

 

 

 

 

 

 


> RBF: RouterRpcServer getListing become extremely slow when the children of 
> the dir are mounted in the same ns.
> --
>
> Key: HDFS-17147
> URL: https://issues.apache.org/jira/browse/HDFS-17147
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Priority: Major
>
> Suppose we mount table as below:
>  
> {code:java}
> /dir -> ns0 ->  /target/dir
> /dir/child1 -> ns0 -> /target/dir/child1
> /dir/child2 -> ns0 -> /target/dir/child2
> ..
> /dir/child200 -> ns0 -> /target/dir/child200
> {code}
>  
>  
> when listing /dir with RBF, it's getting extremely slow as getListing has two 
> parts:
> 1. list all children of  /target/dir
> 2. append the rest 200 mount points to the result.
>  
> The second part invoke getFileInfo concurrently to make sure mount points are 
> accessed under rightful permission. But in this case, the first part includes 
> the result of the second part, and there is no need to append second part 
> repeatly.
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.

2023-08-08 Thread Yuanbo Liu (Jira)
Yuanbo Liu created HDFS-17147:
-

 Summary: RBF: RouterRpcServer getListing become extremely slow 
when the children of the dir are mounted in the same ns.
 Key: HDFS-17147
 URL: https://issues.apache.org/jira/browse/HDFS-17147
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yuanbo Liu


Suppose we mount table as below:

/dir -> ns0 ->  /target/dir

/dir/child1 -> ns0 -> /target/dir/child1

/dir/child2 -> ns0 -> /target/dir/child2

..

/dir/child200 -> ns0 -> /target/dir/child200

 

when listing /dir with RBF, it's getting extremely slow as getListing has two 
parts:
1. list all children of  /target/dir

2. append the rest 200 mount points to the result.

 

The second part invoke getFileInfo concurrently to make sure mount points are 
accessed under rightful permission. But in this case, the first part includes 
the result of the second part, and there is no need to append second part 
repeatly.

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17140) Optimize the BPOfferService.reportBadBlocks() method

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751937#comment-17751937
 ] 

ASF GitHub Bot commented on HDFS-17140:
---

slfan1989 commented on code in PR #5924:
URL: https://github.com/apache/hadoop/pull/5924#discussion_r1286759399


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -291,9 +291,8 @@ public String toString() {
   void reportBadBlocks(ExtendedBlock block,
String storageUuid, StorageType storageType) {
 checkBlock(block);
+ReportBadBlockAction rbbAction = new ReportBadBlockAction(block, 
storageUuid, storageType);

Review Comment:
   Thank you very much for your explanation! Personally, I believe this 
modification seems a bit forced and unnecessary. 
   Let's wait for @2005hithlj  to give a detailed explanation.





> Optimize the BPOfferService.reportBadBlocks() method
> 
>
> Key: HDFS-17140
> URL: https://issues.apache.org/jira/browse/HDFS-17140
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Liangjun He
>Assignee: Liangjun He
>Priority: Minor
>  Labels: pull-request-available
>
> The current BPOfferService.reportBadBlocks() method can be optimized by 
> moving the creation of the rbbAction object outside the loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17145) Fix description of property dfs.namenode.file.close.num-committed-allowed.

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751935#comment-17751935
 ] 

ASF GitHub Bot commented on HDFS-17145:
---

hadoop-yetus commented on PR #5933:
URL: https://github.com/apache/hadoop/pull/5933#issuecomment-1669119407

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 29s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 11s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  mvnsite  |   0m 56s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  shadedclient  |  57m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 48s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  javac  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  mvnsite  |   0m 47s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 39s |  |  the patch passed with JDK 
Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05  |
   | +1 :green_heart: |  shadedclient  |  23m 48s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 198m 41s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 287m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.43 ServerAPI=1.43 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5933 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient codespell detsecrets xmllint |
   | uname | Linux b57df667dd49 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 
13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / fb8b16d54e29e3309d09130814e4f2c34dc6e1b2 |
   | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_382-8u382-ga-1~20.04.1-b05 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/testReport/ |
   | Max. process+thread count | 3658 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered

[jira] [Updated] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.

2023-08-08 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-17137:
---
Component/s: namenode

> Standby/Observer NameNode skip to handle redundant replica block logic when 
> set decrease replication. 
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.

2023-08-08 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HDFS-17137.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Standby/Observer NameNode skip to handle redundant replica block logic when 
> set decrease replication. 
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.

2023-08-08 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He updated HDFS-17137:
---
Summary: Standby/Observer NameNode skip to handle redundant replica block 
logic when set decrease replication.   (was:  Standby/Observer NameNode should 
not  handle redundant replica block logic  when set decrease replication)

> Standby/Observer NameNode skip to handle redundant replica block logic when 
> set decrease replication. 
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17137) Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751926#comment-17751926
 ] 

ASF GitHub Bot commented on HDFS-17137:
---

Hexiaoqiao commented on PR #5913:
URL: https://github.com/apache/hadoop/pull/5913#issuecomment-1669086283

   The failed unit test is not related to this changes. Committed to trunk.
   Thanks @haiyang1987 for your contribution and @tomscut reviews!




>  Standby/Observer NameNode should not  handle redundant replica block logic  
> when set decrease replication
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17137) Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751925#comment-17751925
 ] 

ASF GitHub Bot commented on HDFS-17137:
---

Hexiaoqiao merged PR #5913:
URL: https://github.com/apache/hadoop/pull/5913




>  Standby/Observer NameNode should not  handle redundant replica block logic  
> when set decrease replication
> --
>
> Key: HDFS-17137
> URL: https://issues.apache.org/jira/browse/HDFS-17137
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> Standby/Observer NameNode should not handle redundant replica block logic 
> when set decrease replication.
> At present, when call setReplication to execute the logic of  decrease 
> replication, 
> * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock 
> method to select the dn of the redundant replica , will add to the 
> excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be 
> scheduled to delete the block on dn).
> * Then the StandyNameNode or ObserverNameNode load editlog and apply the 
> SetReplicationOp, if the dn of the replica to be deleted has not yet 
> performed incremental block report,
> here also will BlockManager#processExtraRedundancyBlock method be called here 
> to select the dn of the redundant replica and add it to the 
> excessRedundancyMap (here selected the redundant dn  may be inconsistent with 
> the dn selected in the active namenode).
> In excessRedundancyMap exist dn maybe affects the dn decommission, resulting 
> can not to complete decommission dn operation in Standy/ObserverNameNode.
> The specific cases are as follows:
> For example a file is 3 replica (d1,d2,d3)  and call setReplication set file 
> to 2 replica.
> * ActiveNameNode  select d1 with redundant replicas to add 
> toexcessRedundancyMap and invalidateBlocks.
> * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet 
> executed incremental block report), so here maybe selected redundant replica 
> dn are inconsistent with ActiveNameNode, such as select d2 to add  
> excessRedundancyMap.
> * At this time, d1 completes deleting the block for incremental block report.
> * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 
> from in the excessRedundancyMap when processing the incremental block report 
> ).
> * The DN list for this block in StandyNameNode includes d2 and d3  (can not 
> delete d2 from in the excessRedundancyMap when processing the incremental 
> block report).
> At this time, execute the decommission operation on d3.
> * ActiveNameNode will select a new node d4 to copy the replica, and d4 will 
> run incrementally block report.
> * The DN list for this block in ActiveNameNode includes d2 and 
> d3(decommissioning status),d4, then d3 can to decommissioned normally.
> * The DN list for this block in StandyNameNode is d3 (decommissioning 
> status), d2 (redundant status), d4.  
> since the requirements for two live replica are not met, d3 cannot be 
> decommissioned at this time.
> Therefore, StandyNameNode or ObserverNameNode considers not process redundant 
> replicas logic when call setReplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17140) Optimize the BPOfferService.reportBadBlocks() method

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751923#comment-17751923
 ] 

ASF GitHub Bot commented on HDFS-17140:
---

Hexiaoqiao commented on code in PR #5924:
URL: https://github.com/apache/hadoop/pull/5924#discussion_r1286718143


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java:
##
@@ -291,9 +291,8 @@ public String toString() {
   void reportBadBlocks(ExtendedBlock block,
String storageUuid, StorageType storageType) {
 checkBlock(block);
+ReportBadBlockAction rbbAction = new ReportBadBlockAction(block, 
storageUuid, storageType);

Review Comment:
   I am not worried about that, because `BPOfferService` is isolated for 
different namespaces. But I am wonder what improvement here, save the cost to 
create object and heap footprint?





> Optimize the BPOfferService.reportBadBlocks() method
> 
>
> Key: HDFS-17140
> URL: https://issues.apache.org/jira/browse/HDFS-17140
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Liangjun He
>Assignee: Liangjun He
>Priority: Minor
>  Labels: pull-request-available
>
> The current BPOfferService.reportBadBlocks() method can be optimized by 
> moving the creation of the rbbAction object outside the loop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17143) Optimize the logic for reconfigure ReadStrategy enable for Namenode

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751921#comment-17751921
 ] 

ASF GitHub Bot commented on HDFS-17143:
---

huangzhaobo99 commented on PR #5930:
URL: https://github.com/apache/hadoop/pull/5930#issuecomment-1669064696

   Hi @slfan1989, Could you help review this when you have time? Thanks.




> Optimize the logic for reconfigure ReadStrategy enable for Namenode
> ---
>
> Key: HDFS-17143
> URL: https://issues.apache.org/jira/browse/HDFS-17143
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17143) Optimize the logic for reconfigure ReadStrategy enable for Namenode

2023-08-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751920#comment-17751920
 ] 

ASF GitHub Bot commented on HDFS-17143:
---

huangzhaobo99 commented on PR #5930:
URL: https://github.com/apache/hadoop/pull/5930#issuecomment-1669062915

   Those failed unit tests were unrelated to the change. And they work fine 
locally.




> Optimize the logic for reconfigure ReadStrategy enable for Namenode
> ---
>
> Key: HDFS-17143
> URL: https://issues.apache.org/jira/browse/HDFS-17143
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org