[jira] [Commented] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.
[ https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752284#comment-17752284 ] Xiaoqiao He commented on HDFS-17149: Hi [~zhanghaobo], Please check if HDFS-15079 can solve this issue. > getBlockLocations RPC should use actual client ip to compute network distance > when using RBF. > - > > Key: HDFS-17149 > URL: https://issues.apache.org/jira/browse/HDFS-17149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > Please correct me if i understand wrongly. Thanks. > Currently, when a getBlockLocations RPC forwards to namenode via router. > NameNode will use router ip address as client machine to compute network > distance against block's locations. See FSNamesystem#sortLocatedBlocksMore > method for more detailed information. > I think this compute method is not correct and should use actual client ip. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy
[ https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752282#comment-17752282 ] ASF GitHub Bot commented on HDFS-17030: --- xinglin commented on PR #5878: URL: https://github.com/apache/hadoop/pull/5878#issuecomment-1670751349 > > Hi @goiri, > > could you take a look at this backport PR for branch-3.3 as well? thanks, > > You'd have to put a separate PR together I'd say. I am confused: this is a separate PR, right? > Limit wait time for getHAServiceState in ObserverReaderProxy > > > Key: HDFS-17030 > URL: https://issues.apache.org/jira/browse/HDFS-17030 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When namenode HA is enabled and a standby NN is not responsible, we have > observed it would take a long time to serve a request, even though we have a > healthy observer or active NN. > Basically, when a standby is down, the RPC client would (re)try to create > socket connection to that standby for _ipc.client.connect.timeout_ _* > ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a > heap dump at a standby, the NN still accepts the socket connection but it > won't send responses to these RPC requests and we would timeout after > _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters > at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a > request takes more than 2 mins to complete when we take a heap dump at a > standby. This has been causing user job failures. > We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending > getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we > still use the original value from the config). However, that would double the > socket connection between clients and the NN (which is a deal-breaker). > The proposal is to add a timeout on getHAServiceState() calls in > ObserverReaderProxy and we will only wait for the timeout for an NN to > respond its HA state. Once we pass that timeout, we will move on to probe the > next NN. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy
[ https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752281#comment-17752281 ] ASF GitHub Bot commented on HDFS-17030: --- xinglin commented on code in PR #5878: URL: https://github.com/apache/hadoop/pull/5878#discussion_r1288010220 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java: ## @@ -285,13 +323,67 @@ private synchronized NNProxyInfo changeProxy(NNProxyInfo initial) { } currentIndex = (currentIndex + 1) % nameNodeProxies.size(); currentProxy = createProxyIfNeeded(nameNodeProxies.get(currentIndex)); -currentProxy.setCachedState(getHAServiceState(currentProxy)); +currentProxy.setCachedState(getHAServiceStateWithTimeout(currentProxy)); LOG.debug("Changed current proxy from {} to {}", initial == null ? "none" : initial.proxyInfo, currentProxy.proxyInfo); return currentProxy; } + /** + * Execute getHAServiceState() call with a timeout, to avoid a long wait when + * an NN becomes irresponsive to rpc requests + * (when a thread/heap dump is being taken, e.g.). + * + * For each getHAServiceState() call, a task is created and submitted to a + * threadpool for execution. We will wait for a response up to + * namenodeHAStateProbeTimeoutSec and cancel these requests if they time out. + * + * The implementation is split into two functions so that we can unit test + * the second function. + */ + HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo) { +Callable getHAServiceStateTask = () -> getHAServiceState(proxyInfo); + +try { + Future task = + nnProbingThreadPool.submit(getHAServiceStateTask); Review Comment: fixed. fits in one line with 100 characters. So, did not bother splitting into two lines. > Limit wait time for getHAServiceState in ObserverReaderProxy > > > Key: HDFS-17030 > URL: https://issues.apache.org/jira/browse/HDFS-17030 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When namenode HA is enabled and a standby NN is not responsible, we have > observed it would take a long time to serve a request, even though we have a > healthy observer or active NN. > Basically, when a standby is down, the RPC client would (re)try to create > socket connection to that standby for _ipc.client.connect.timeout_ _* > ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a > heap dump at a standby, the NN still accepts the socket connection but it > won't send responses to these RPC requests and we would timeout after > _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters > at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a > request takes more than 2 mins to complete when we take a heap dump at a > standby. This has been causing user job failures. > We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending > getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we > still use the original value from the config). However, that would double the > socket connection between clients and the NN (which is a deal-breaker). > The proposal is to add a timeout on getHAServiceState() calls in > ObserverReaderProxy and we will only wait for the timeout for an NN to > respond its HA state. Once we pass that timeout, we will move on to probe the > next NN. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17030) Limit wait time for getHAServiceState in ObserverReaderProxy
[ https://issues.apache.org/jira/browse/HDFS-17030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752280#comment-17752280 ] ASF GitHub Bot commented on HDFS-17030: --- xinglin commented on code in PR #5878: URL: https://github.com/apache/hadoop/pull/5878#discussion_r1288009483 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/ObserverReadProxyProvider.java: ## @@ -285,13 +323,67 @@ private synchronized NNProxyInfo changeProxy(NNProxyInfo initial) { } currentIndex = (currentIndex + 1) % nameNodeProxies.size(); currentProxy = createProxyIfNeeded(nameNodeProxies.get(currentIndex)); -currentProxy.setCachedState(getHAServiceState(currentProxy)); +currentProxy.setCachedState(getHAServiceStateWithTimeout(currentProxy)); LOG.debug("Changed current proxy from {} to {}", initial == null ? "none" : initial.proxyInfo, currentProxy.proxyInfo); return currentProxy; } + /** + * Execute getHAServiceState() call with a timeout, to avoid a long wait when + * an NN becomes irresponsive to rpc requests + * (when a thread/heap dump is being taken, e.g.). + * + * For each getHAServiceState() call, a task is created and submitted to a + * threadpool for execution. We will wait for a response up to + * namenodeHAStateProbeTimeoutSec and cancel these requests if they time out. + * + * The implementation is split into two functions so that we can unit test + * the second function. + */ + HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo) { +Callable getHAServiceStateTask = () -> getHAServiceState(proxyInfo); + +try { + Future task = + nnProbingThreadPool.submit(getHAServiceStateTask); + return getHAServiceStateWithTimeout(proxyInfo, task); +} catch (RejectedExecutionException e) { + LOG.warn("Run out of threads to submit the request to query HA state. " + + "Ok to return null and we will fallback to use active NN to serve " + + "this request."); + return null; +} + } + + HAServiceState getHAServiceStateWithTimeout(final NNProxyInfo proxyInfo, + Future task) { +HAServiceState state = null; +try { + if (namenodeHAStateProbeTimeoutMs > 0) { +state = task.get(namenodeHAStateProbeTimeoutMs, TimeUnit.MILLISECONDS); + } else { +// Disable timeout by waiting indefinitely when namenodeHAStateProbeTimeoutSec is set to 0 +// or a negative value. +state = task.get(); + } + LOG.debug("HA State for {} is {}", proxyInfo.proxyInfo, state); +} catch (TimeoutException e) { + // Cancel the task on timeout + String msg = String.format("Cancel NN probe task due to timeout for %s", proxyInfo.proxyInfo); + LOG.warn(msg, e); + if (task != null) { Review Comment: removed. > Limit wait time for getHAServiceState in ObserverReaderProxy > > > Key: HDFS-17030 > URL: https://issues.apache.org/jira/browse/HDFS-17030 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xing Lin >Assignee: Xing Lin >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > When namenode HA is enabled and a standby NN is not responsible, we have > observed it would take a long time to serve a request, even though we have a > healthy observer or active NN. > Basically, when a standby is down, the RPC client would (re)try to create > socket connection to that standby for _ipc.client.connect.timeout_ _* > ipc.client.connect.max.retries.on.timeouts_ before giving up. When we take a > heap dump at a standby, the NN still accepts the socket connection but it > won't send responses to these RPC requests and we would timeout after > _ipc.client.rpc-timeout.ms._ This adds a significantly latency. For clusters > at Linkedin, we set _ipc.client.rpc-timeout.ms_ to 120 seconds and thus a > request takes more than 2 mins to complete when we take a heap dump at a > standby. This has been causing user job failures. > We could set _ipc.client.rpc-timeout.ms to_ a smaller value when sending > getHAServiceState requests in ObserverReaderProxy (for user rpc requests, we > still use the original value from the config). However, that would double the > socket connection between clients and the NN (which is a deal-breaker). > The proposal is to add a timeout on getHAServiceState() calls in > ObserverReaderProxy and we will only wait for the timeout for an NN to > respond its HA state. Once we pass that timeout, we will move on to probe the > next NN. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --
[jira] [Assigned] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.
[ https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] farmmamba reassigned HDFS-17149: Assignee: farmmamba > getBlockLocations RPC should use actual client ip to compute network distance > when using RBF. > - > > Key: HDFS-17149 > URL: https://issues.apache.org/jira/browse/HDFS-17149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Major > > Please correct me if i understand wrongly. Thanks. > Currently, when a getBlockLocations RPC forwards to namenode via router. > NameNode will use router ip address as client machine to compute network > distance against block's locations. See FSNamesystem#sortLocatedBlocksMore > method for more detailed information. > I think this compute method is not correct and should use actual client ip. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.
[ https://issues.apache.org/jira/browse/HDFS-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752243#comment-17752243 ] farmmamba commented on HDFS-17149: -- [~hexiaoqiao] [~ayushsaxena] [~tomscut] [~zhangshuyan] Sir, sorry for disturbing you here. Please have a look at this issue when you have free time and please correct me if i understand wrongly. Thanks all. > getBlockLocations RPC should use actual client ip to compute network distance > when using RBF. > - > > Key: HDFS-17149 > URL: https://issues.apache.org/jira/browse/HDFS-17149 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.4.0 >Reporter: farmmamba >Priority: Major > > Please correct me if i understand wrongly. Thanks. > Currently, when a getBlockLocations RPC forwards to namenode via router. > NameNode will use router ip address as client machine to compute network > distance against block's locations. See FSNamesystem#sortLocatedBlocksMore > method for more detailed information. > I think this compute method is not correct and should use actual client ip. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17149) getBlockLocations RPC should use actual client ip to compute network distance when using RBF.
farmmamba created HDFS-17149: Summary: getBlockLocations RPC should use actual client ip to compute network distance when using RBF. Key: HDFS-17149 URL: https://issues.apache.org/jira/browse/HDFS-17149 Project: Hadoop HDFS Issue Type: Improvement Components: namanode Affects Versions: 3.4.0 Reporter: farmmamba Please correct me if i understand wrongly. Thanks. Currently, when a getBlockLocations RPC forwards to namenode via router. NameNode will use router ip address as client machine to compute network distance against block's locations. See FSNamesystem#sortLocatedBlocksMore method for more detailed information. I think this compute method is not correct and should use actual client ip. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL
Hector Sandoval Chaverri created HDFS-17148: --- Summary: RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL Key: HDFS-17148 URL: https://issues.apache.org/jira/browse/HDFS-17148 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: Hector Sandoval Chaverri The SQLDelegationTokenSecretManager fetches tokens from SQL and stores them temporarily in a memory cache with a short TTL. The ExpiredTokenRemover in AbstractDelegationTokenSecretManager runs periodically to cleanup any expired tokens from the cache, but most tokens have been evicted automatically per the TTL configuration. This leads to many expired tokens in the SQL database that should be cleaned up. The SQLDelegationTokenSecretManager should find expired tokens in SQL instead of in the memory cache when running the periodic cleanup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16977) Forbid assigned characters in pathname.
[ https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752125#comment-17752125 ] ASF GitHub Bot commented on HDFS-16977: --- hadoop-yetus commented on PR #5547: URL: https://github.com/apache/hadoop/pull/5547#issuecomment-1669950199 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 44s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 3s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 32m 12s | | trunk passed | | +1 :green_heart: | compile | 5m 37s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 5m 32s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 1m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 32s | | trunk passed | | +1 :green_heart: | javadoc | 2m 0s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 26s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 5m 54s | | trunk passed | | +1 :green_heart: | shadedclient | 36m 9s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 32s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 3s | | the patch passed | | +1 :green_heart: | compile | 5m 26s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 5m 26s | | the patch passed | | +1 :green_heart: | compile | 5m 18s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 5m 18s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 18s | [/results-checkstyle-hadoop-hdfs-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/results-checkstyle-hadoop-hdfs-project.txt) | hadoop-hdfs-project: The patch generated 3 new + 395 unchanged - 0 fixed = 398 total (was 395) | | +1 :green_heart: | mvnsite | 2m 10s | | the patch passed | | +1 :green_heart: | javadoc | 1m 36s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 2m 11s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 5m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 35m 55s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 2m 29s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 221m 18s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 397m 0s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier | | | hadoop.hdfs.server.datanode.TestDirectoryScanner | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5547/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5547 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux aa50f62265b1 4.15.0-212-generic #223-Ubuntu SMP Tue May 23 13:09:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin
[jira] [Commented] (HDFS-17093) In the case of all datanodes sending FBR when the namenode restarts (large clusters), there is an issue with incomplete block reporting
[ https://issues.apache.org/jira/browse/HDFS-17093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752071#comment-17752071 ] ASF GitHub Bot commented on HDFS-17093: --- hadoop-yetus commented on PR #5855: URL: https://github.com/apache/hadoop/pull/5855#issuecomment-1669741496 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 30s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 7s | | trunk passed | | +1 :green_heart: | compile | 0m 52s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 49s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | checkstyle | 0m 47s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 55s | | trunk passed | | +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 11s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 59s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 2s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 46s | | the patch passed | | +1 :green_heart: | compile | 0m 50s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 50s | | the patch passed | | +1 :green_heart: | compile | 0m 42s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 42s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/blanks-eol.txt) | The patch has 2 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 0m 35s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 47s | | the patch passed | | +1 :green_heart: | javadoc | 0m 37s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 6s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | spotbugs | 1m 53s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 202m 35s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 297m 10s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5855 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux d0dcaa24cb96 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5af06d98849707bed42863172dc38247aba428c8 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5855/12/testReport/ | | Max
[jira] [Commented] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752003#comment-17752003 ] ASF GitHub Bot commented on HDFS-17137: --- haiyang1987 commented on PR #5913: URL: https://github.com/apache/hadoop/pull/5913#issuecomment-1669411025 Thanks sir @Hexiaoqiao @tomscut help me review and merge !!! > Standby/Observer NameNode skip to handle redundant replica block logic when > set decrease replication. > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16977) Forbid assigned characters in pathname.
[ https://issues.apache.org/jira/browse/HDFS-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751984#comment-17751984 ] ASF GitHub Bot commented on HDFS-16977: --- YuanbenWang closed pull request #5547: HDFS-16977. Forbid assigned characters in pathname. URL: https://github.com/apache/hadoop/pull/5547 > Forbid assigned characters in pathname. > --- > > Key: HDFS-16977 > URL: https://issues.apache.org/jira/browse/HDFS-16977 > Project: Hadoop HDFS > Issue Type: New Feature > Components: dfsclient, namenode >Affects Versions: 3.3.4 >Reporter: WangYuanben >Priority: Minor > Labels: pull-request-available > Attachments: HDFS-16977__Forbid_assigned_characters_in_pathname_.patch > > > Some pathnames which contains special character(s) may lead to unexpected > results. For example, there is a file named "/foo/file*" in my cluster, > created by "DistributedFileSystem.create(new Path("/foo/file*"))". When I > want to remove it, I type in "hadoop fs -rm /foo/file*" in shell. However, I > remove all the files with the prefix of "/foo/file*" unexpectedly. There are > also some other characters just like '*', such as ' ', '|', '&', etc. > > Therefore, it's necessary to restrict the occurrence of these characters in > pathname. A simple but effective way is to forbid assigned characters in > pathname when new file or directory is created. > > It is also important to add the same function on the Router model and WebHdfs > model. I will add them as two subtasks later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
[ https://issues.apache.org/jira/browse/HDFS-17147 ] Yuanbo Liu deleted comment on HDFS-17147: --- was (Author: yuanbo): cc [~inigoiri] [~zhangshuyan] [~ayushsaxena] , > RBF: RouterRpcServer getListing become extremely slow when the children of > the dir are mounted in the same ns. > -- > > Key: HDFS-17147 > URL: https://issues.apache.org/jira/browse/HDFS-17147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Yuanbo Liu >Priority: Major > > Suppose we mount table as below: > > {code:java} > /dir -> ns0 -> /target/dir > /dir/child1 -> ns0 -> /target/dir/child1 > /dir/child2 -> ns0 -> /target/dir/child2 > .. > /dir/child200 -> ns0 -> /target/dir/child200 > {code} > > > when listing /dir with RBF, it's getting extremely slow as getListing has two > parts: > 1. list all children of /target/dir > 2. append the rest 200 mount points to the result. > > The second part invoke getFileInfo concurrently to make sure mount points are > accessed under rightful permission. But in this case, the first part includes > the result of the second part, and there is no need to append second part > repeatly. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
[ https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751980#comment-17751980 ] Yuanbo Liu commented on HDFS-17147: --- cc [~inigoiri] [~zhangshuyan] [~ayushsaxena] , > RBF: RouterRpcServer getListing become extremely slow when the children of > the dir are mounted in the same ns. > -- > > Key: HDFS-17147 > URL: https://issues.apache.org/jira/browse/HDFS-17147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Yuanbo Liu >Priority: Major > > Suppose we mount table as below: > > {code:java} > /dir -> ns0 -> /target/dir > /dir/child1 -> ns0 -> /target/dir/child1 > /dir/child2 -> ns0 -> /target/dir/child2 > .. > /dir/child200 -> ns0 -> /target/dir/child200 > {code} > > > when listing /dir with RBF, it's getting extremely slow as getListing has two > parts: > 1. list all children of /target/dir > 2. append the rest 200 mount points to the result. > > The second part invoke getFileInfo concurrently to make sure mount points are > accessed under rightful permission. But in this case, the first part includes > the result of the second part, and there is no need to append second part > repeatly. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
[ https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751979#comment-17751979 ] Yuanbo Liu commented on HDFS-17147: --- cc [~inigoiri] [~zhangshuyan] [~ayushsaxena] , > RBF: RouterRpcServer getListing become extremely slow when the children of > the dir are mounted in the same ns. > -- > > Key: HDFS-17147 > URL: https://issues.apache.org/jira/browse/HDFS-17147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Yuanbo Liu >Priority: Major > > Suppose we mount table as below: > > {code:java} > /dir -> ns0 -> /target/dir > /dir/child1 -> ns0 -> /target/dir/child1 > /dir/child2 -> ns0 -> /target/dir/child2 > .. > /dir/child200 -> ns0 -> /target/dir/child200 > {code} > > > when listing /dir with RBF, it's getting extremely slow as getListing has two > parts: > 1. list all children of /target/dir > 2. append the rest 200 mount points to the result. > > The second part invoke getFileInfo concurrently to make sure mount points are > accessed under rightful permission. But in this case, the first part includes > the result of the second part, and there is no need to append second part > repeatly. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
[ https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-17147: -- Component/s: rbf > RBF: RouterRpcServer getListing become extremely slow when the children of > the dir are mounted in the same ns. > -- > > Key: HDFS-17147 > URL: https://issues.apache.org/jira/browse/HDFS-17147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Yuanbo Liu >Priority: Major > > Suppose we mount table as below: > /dir -> ns0 -> /target/dir > /dir/child1 -> ns0 -> /target/dir/child1 > /dir/child2 -> ns0 -> /target/dir/child2 > .. > /dir/child200 -> ns0 -> /target/dir/child200 > > when listing /dir with RBF, it's getting extremely slow as getListing has two > parts: > 1. list all children of /target/dir > 2. append the rest 200 mount points to the result. > > The second part invoke getFileInfo concurrently to make sure mount points are > accessed under rightful permission. But in this case, the first part includes > the result of the second part, and there is no need to append second part > repeatly. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
[ https://issues.apache.org/jira/browse/HDFS-17147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated HDFS-17147: -- Description: Suppose we mount table as below: {code:java} /dir -> ns0 -> /target/dir /dir/child1 -> ns0 -> /target/dir/child1 /dir/child2 -> ns0 -> /target/dir/child2 .. /dir/child200 -> ns0 -> /target/dir/child200 {code} when listing /dir with RBF, it's getting extremely slow as getListing has two parts: 1. list all children of /target/dir 2. append the rest 200 mount points to the result. The second part invoke getFileInfo concurrently to make sure mount points are accessed under rightful permission. But in this case, the first part includes the result of the second part, and there is no need to append second part repeatly. was: Suppose we mount table as below: /dir -> ns0 -> /target/dir /dir/child1 -> ns0 -> /target/dir/child1 /dir/child2 -> ns0 -> /target/dir/child2 .. /dir/child200 -> ns0 -> /target/dir/child200 when listing /dir with RBF, it's getting extremely slow as getListing has two parts: 1. list all children of /target/dir 2. append the rest 200 mount points to the result. The second part invoke getFileInfo concurrently to make sure mount points are accessed under rightful permission. But in this case, the first part includes the result of the second part, and there is no need to append second part repeatly. > RBF: RouterRpcServer getListing become extremely slow when the children of > the dir are mounted in the same ns. > -- > > Key: HDFS-17147 > URL: https://issues.apache.org/jira/browse/HDFS-17147 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Reporter: Yuanbo Liu >Priority: Major > > Suppose we mount table as below: > > {code:java} > /dir -> ns0 -> /target/dir > /dir/child1 -> ns0 -> /target/dir/child1 > /dir/child2 -> ns0 -> /target/dir/child2 > .. > /dir/child200 -> ns0 -> /target/dir/child200 > {code} > > > when listing /dir with RBF, it's getting extremely slow as getListing has two > parts: > 1. list all children of /target/dir > 2. append the rest 200 mount points to the result. > > The second part invoke getFileInfo concurrently to make sure mount points are > accessed under rightful permission. But in this case, the first part includes > the result of the second part, and there is no need to append second part > repeatly. > > > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17147) RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns.
Yuanbo Liu created HDFS-17147: - Summary: RBF: RouterRpcServer getListing become extremely slow when the children of the dir are mounted in the same ns. Key: HDFS-17147 URL: https://issues.apache.org/jira/browse/HDFS-17147 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yuanbo Liu Suppose we mount table as below: /dir -> ns0 -> /target/dir /dir/child1 -> ns0 -> /target/dir/child1 /dir/child2 -> ns0 -> /target/dir/child2 .. /dir/child200 -> ns0 -> /target/dir/child200 when listing /dir with RBF, it's getting extremely slow as getListing has two parts: 1. list all children of /target/dir 2. append the rest 200 mount points to the result. The second part invoke getFileInfo concurrently to make sure mount points are accessed under rightful permission. But in this case, the first part includes the result of the second part, and there is no need to append second part repeatly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17140) Optimize the BPOfferService.reportBadBlocks() method
[ https://issues.apache.org/jira/browse/HDFS-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751937#comment-17751937 ] ASF GitHub Bot commented on HDFS-17140: --- slfan1989 commented on code in PR #5924: URL: https://github.com/apache/hadoop/pull/5924#discussion_r1286759399 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java: ## @@ -291,9 +291,8 @@ public String toString() { void reportBadBlocks(ExtendedBlock block, String storageUuid, StorageType storageType) { checkBlock(block); +ReportBadBlockAction rbbAction = new ReportBadBlockAction(block, storageUuid, storageType); Review Comment: Thank you very much for your explanation! Personally, I believe this modification seems a bit forced and unnecessary. Let's wait for @2005hithlj to give a detailed explanation. > Optimize the BPOfferService.reportBadBlocks() method > > > Key: HDFS-17140 > URL: https://issues.apache.org/jira/browse/HDFS-17140 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Liangjun He >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > > The current BPOfferService.reportBadBlocks() method can be optimized by > moving the creation of the rbbAction object outside the loop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17145) Fix description of property dfs.namenode.file.close.num-committed-allowed.
[ https://issues.apache.org/jira/browse/HDFS-17145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751935#comment-17751935 ] ASF GitHub Bot commented on HDFS-17145: --- hadoop-yetus commented on PR #5933: URL: https://github.com/apache/hadoop/pull/5933#issuecomment-1669119407 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 29s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 11s | | trunk passed | | +1 :green_heart: | compile | 0m 55s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | compile | 0m 50s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | mvnsite | 0m 56s | | trunk passed | | +1 :green_heart: | javadoc | 0m 51s | | trunk passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | shadedclient | 57m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 45s | | the patch passed | | +1 :green_heart: | compile | 0m 48s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javac | 0m 48s | | the patch passed | | +1 :green_heart: | compile | 0m 41s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | javac | 0m 41s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | mvnsite | 0m 47s | | the patch passed | | +1 :green_heart: | javadoc | 0m 39s | | the patch passed with JDK Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 0s | | the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | +1 :green_heart: | shadedclient | 23m 48s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 198m 41s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 287m 14s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5933 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint | | uname | Linux b57df667dd49 4.15.0-213-generic #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / fb8b16d54e29e3309d09130814e4f2c34dc6e1b2 | | Default Java | Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20+8-post-Ubuntu-1ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~20.04.1-b05 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/testReport/ | | Max. process+thread count | 3658 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5933/1/console | | versions | git=2.25.1 maven=3.6.3 | | Powered
[jira] [Updated] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-17137: --- Component/s: namenode > Standby/Observer NameNode skip to handle redundant replica block logic when > set decrease replication. > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HDFS-17137. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Standby/Observer NameNode skip to handle redundant replica block logic when > set decrease replication. > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-17137) Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication.
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-17137: --- Summary: Standby/Observer NameNode skip to handle redundant replica block logic when set decrease replication. (was: Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication) > Standby/Observer NameNode skip to handle redundant replica block logic when > set decrease replication. > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17137) Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751926#comment-17751926 ] ASF GitHub Bot commented on HDFS-17137: --- Hexiaoqiao commented on PR #5913: URL: https://github.com/apache/hadoop/pull/5913#issuecomment-1669086283 The failed unit test is not related to this changes. Committed to trunk. Thanks @haiyang1987 for your contribution and @tomscut reviews! > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17137) Standby/Observer NameNode should not handle redundant replica block logic when set decrease replication
[ https://issues.apache.org/jira/browse/HDFS-17137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751925#comment-17751925 ] ASF GitHub Bot commented on HDFS-17137: --- Hexiaoqiao merged PR #5913: URL: https://github.com/apache/hadoop/pull/5913 > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication > -- > > Key: HDFS-17137 > URL: https://issues.apache.org/jira/browse/HDFS-17137 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > Standby/Observer NameNode should not handle redundant replica block logic > when set decrease replication. > At present, when call setReplication to execute the logic of decrease > replication, > * ActiveNameNode will call the BlockManager#processExtraRedundancyBlock > method to select the dn of the redundant replica , will add to the > excessRedundancyMap and add to invalidateBlocks (RedundancyMonitor will be > scheduled to delete the block on dn). > * Then the StandyNameNode or ObserverNameNode load editlog and apply the > SetReplicationOp, if the dn of the replica to be deleted has not yet > performed incremental block report, > here also will BlockManager#processExtraRedundancyBlock method be called here > to select the dn of the redundant replica and add it to the > excessRedundancyMap (here selected the redundant dn may be inconsistent with > the dn selected in the active namenode). > In excessRedundancyMap exist dn maybe affects the dn decommission, resulting > can not to complete decommission dn operation in Standy/ObserverNameNode. > The specific cases are as follows: > For example a file is 3 replica (d1,d2,d3) and call setReplication set file > to 2 replica. > * ActiveNameNode select d1 with redundant replicas to add > toexcessRedundancyMap and invalidateBlocks. > * StandyNameNode replays SetReplicationOp (at this time, d1 has not yet > executed incremental block report), so here maybe selected redundant replica > dn are inconsistent with ActiveNameNode, such as select d2 to add > excessRedundancyMap. > * At this time, d1 completes deleting the block for incremental block report. > * The DN list for this block in ActiveNameNode includes d2 and d3 (delete d1 > from in the excessRedundancyMap when processing the incremental block report > ). > * The DN list for this block in StandyNameNode includes d2 and d3 (can not > delete d2 from in the excessRedundancyMap when processing the incremental > block report). > At this time, execute the decommission operation on d3. > * ActiveNameNode will select a new node d4 to copy the replica, and d4 will > run incrementally block report. > * The DN list for this block in ActiveNameNode includes d2 and > d3(decommissioning status),d4, then d3 can to decommissioned normally. > * The DN list for this block in StandyNameNode is d3 (decommissioning > status), d2 (redundant status), d4. > since the requirements for two live replica are not met, d3 cannot be > decommissioned at this time. > Therefore, StandyNameNode or ObserverNameNode considers not process redundant > replicas logic when call setReplication. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17140) Optimize the BPOfferService.reportBadBlocks() method
[ https://issues.apache.org/jira/browse/HDFS-17140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751923#comment-17751923 ] ASF GitHub Bot commented on HDFS-17140: --- Hexiaoqiao commented on code in PR #5924: URL: https://github.com/apache/hadoop/pull/5924#discussion_r1286718143 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java: ## @@ -291,9 +291,8 @@ public String toString() { void reportBadBlocks(ExtendedBlock block, String storageUuid, StorageType storageType) { checkBlock(block); +ReportBadBlockAction rbbAction = new ReportBadBlockAction(block, storageUuid, storageType); Review Comment: I am not worried about that, because `BPOfferService` is isolated for different namespaces. But I am wonder what improvement here, save the cost to create object and heap footprint? > Optimize the BPOfferService.reportBadBlocks() method > > > Key: HDFS-17140 > URL: https://issues.apache.org/jira/browse/HDFS-17140 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Liangjun He >Assignee: Liangjun He >Priority: Minor > Labels: pull-request-available > > The current BPOfferService.reportBadBlocks() method can be optimized by > moving the creation of the rbbAction object outside the loop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17143) Optimize the logic for reconfigure ReadStrategy enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-17143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751921#comment-17751921 ] ASF GitHub Bot commented on HDFS-17143: --- huangzhaobo99 commented on PR #5930: URL: https://github.com/apache/hadoop/pull/5930#issuecomment-1669064696 Hi @slfan1989, Could you help review this when you have time? Thanks. > Optimize the logic for reconfigure ReadStrategy enable for Namenode > --- > > Key: HDFS-17143 > URL: https://issues.apache.org/jira/browse/HDFS-17143 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-17143) Optimize the logic for reconfigure ReadStrategy enable for Namenode
[ https://issues.apache.org/jira/browse/HDFS-17143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751920#comment-17751920 ] ASF GitHub Bot commented on HDFS-17143: --- huangzhaobo99 commented on PR #5930: URL: https://github.com/apache/hadoop/pull/5930#issuecomment-1669062915 Those failed unit tests were unrelated to the change. And they work fine locally. > Optimize the logic for reconfigure ReadStrategy enable for Namenode > --- > > Key: HDFS-17143 > URL: https://issues.apache.org/jira/browse/HDFS-17143 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: huangzhaobo99 >Assignee: huangzhaobo99 >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org