[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=795540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795540 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 27/Jul/22 07:18 Start Date: 27/Jul/22 07:18 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r930703886 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; Review Comment: Thanks @jojochuang - I have made the required changes. Issue Time Tracking --- Worklog Id: (was: 795540) Time Spent: 3h 10m (was: 3h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=795364=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795364 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 26/Jul/22 17:44 Start Date: 26/Jul/22 17:44 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4626: URL: https://github.com/apache/hadoop/pull/4626#issuecomment-1195790209 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 43m 27s | | trunk passed | | +1 :green_heart: | compile | 1m 45s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 35s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 43s | | trunk passed | | +1 :green_heart: | javadoc | 1m 20s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 42s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 52s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 18s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 24s | | the patch passed | | +1 :green_heart: | compile | 1m 29s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 29s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 1s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 25m 25s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 337m 3s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 457m 4s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4626/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4626 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c71a67c7dd9d 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / a3c4dc90d5108fb8eb57f5efef312e295c8128e2 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4626/2/testReport/ | | Max. process+thread count | 2201 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=795123=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795123 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 26/Jul/22 04:58 Start Date: 26/Jul/22 04:58 Worklog Time Spent: 10m Work Description: jojochuang commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r929524721 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; Review Comment: IMO the variable name is confusing in the first glance. Is it "valid" or "invalid" when it's true? How about renaming it as "isValid"? Issue Time Tracking --- Worklog Id: (was: 795123) Time Spent: 2h 50m (was: 2h 40m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=795120=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795120 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 26/Jul/22 03:54 Start Date: 26/Jul/22 03:54 Worklog Time Spent: 10m Work Description: PrabhuJoseph commented on PR #4626: URL: https://github.com/apache/hadoop/pull/4626#issuecomment-1194967117 Thanks @ashutoshcipher for the patch. Can you include a test case. Issue Time Tracking --- Worklog Id: (was: 795120) Time Spent: 2h 40m (was: 2.5h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=795003=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795003 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 17:56 Start Date: 25/Jul/22 17:56 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4626: URL: https://github.com/apache/hadoop/pull/4626#issuecomment-1194417858 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 12s | | trunk passed | | +1 :green_heart: | compile | 1m 43s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 40s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 23s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 46s | | trunk passed | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 49s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 24s | | the patch passed | | +1 :green_heart: | compile | 1m 27s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 1s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 25s | | the patch passed | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 18s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 35s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 243m 29s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 14s | | The patch does not generate ASF License warnings. | | | | 353m 28s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4626/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4626 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 31b1acdfdf9e 4.15.0-156-generic #163-Ubuntu SMP Thu Aug 19 23:31:58 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 28c374cf535405254f4183528864014ea5776fc8 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4626/1/testReport/ | | Max. process+thread count | 3405 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794931=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794931 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:43 Start Date: 25/Jul/22 14:43 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on PR #4626: URL: https://github.com/apache/hadoop/pull/4626#issuecomment-1194143412 Thanks @slfan1989 for your review and approval :) Issue Time Tracking --- Worklog Id: (was: 794931) Time Spent: 2h 20m (was: 2h 10m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794929=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794929 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:40 Start Date: 25/Jul/22 14:40 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928961283 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem Review Comment: This change hasnt made my me. For fixing any such existing `Single Line` issues - new JIRA can be created to clean or modify such issues on module level. Issue Time Tracking --- Worklog Id: (was: 794929) Time Spent: 2h 10m (was: 2h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794928 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:39 Start Date: 25/Jul/22 14:39 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928963626 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -265,6 +266,7 @@ private void check() { // to track maintenance expiration. dnAdmin.setInMaintenance(dn); } else { + inValidState = true; Preconditions.checkState(false, Review Comment: >"Node %s is in an invalid state! " + "Invalid state: %s %s blocks are on this dn." I am not sure if there is really an indentation issue here. I have not added this change here. To correct existing indentations, a separate JIRA can be created to track and correct this on module level if required. Issue Time Tracking --- Worklog Id: (was: 794928) Time Spent: 2h (was: 1h 50m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794926=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794926 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:37 Start Date: 25/Jul/22 14:37 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928961283 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem Review Comment: This change hasnt made my me. For fixing any such issues - new JIRA can be created to clean or modify such issues on module level. Issue Time Tracking --- Worklog Id: (was: 794926) Time Spent: 1h 50m (was: 1h 40m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794919 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:27 Start Date: 25/Jul/22 14:27 Worklog Time Spent: 10m Work Description: slfan1989 commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928949967 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem Review Comment: `while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem.isRunning())` Issue Time Tracking --- Worklog Id: (was: 794919) Time Spent: 1h 40m (was: 1.5h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794917=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794917 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 14:26 Start Date: 25/Jul/22 14:26 Worklog Time Spent: 10m Work Description: slfan1989 commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928949038 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -265,6 +266,7 @@ private void check() { // to track maintenance expiration. dnAdmin.setInMaintenance(dn); } else { + inValidState = true; Preconditions.checkState(false, Review Comment: `"Node %s is in an invalid state! " + "Invalid state: %s %s blocks are on this dn.",` Issue Time Tracking --- Worklog Id: (was: 794917) Time Spent: 1.5h (was: 1h 20m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794903=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794903 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 13:43 Start Date: 25/Jul/22 13:43 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928899669 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -288,7 +290,11 @@ private void check() { // an invalid state. LOG.warn("DatanodeAdminMonitor caught exception when processing node " + "{}.", dn, e); Review Comment: I havent made this change. To correct Log Single Line issues is any and required - A separate JIRA(to check and fix in different modules) can be created I think. Issue Time Tracking --- Worklog Id: (was: 794903) Time Spent: 1h 20m (was: 1h 10m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794901=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794901 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 13:42 Start Date: 25/Jul/22 13:42 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928900049 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem Review Comment: What do you mean by Single Line? Issue Time Tracking --- Worklog Id: (was: 794901) Time Spent: 1h 10m (was: 1h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794900=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794900 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 13:42 Start Date: 25/Jul/22 13:42 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928899669 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -288,7 +290,11 @@ private void check() { // an invalid state. LOG.warn("DatanodeAdminMonitor caught exception when processing node " + "{}.", dn, e); Review Comment: I havent made this change. To correct Log Single Line issues is any and required - A separate JIRA can be created I think. Issue Time Tracking --- Worklog Id: (was: 794900) Time Spent: 1h (was: 50m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794899=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794899 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 13:41 Start Date: 25/Jul/22 13:41 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928898679 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -265,6 +266,7 @@ private void check() { // to track maintenance expiration. dnAdmin.setInMaintenance(dn); } else { + inValidState = true; Preconditions.checkState(false, Review Comment: What's the indentation issue here? Issue Time Tracking --- Worklog Id: (was: 794899) Time Spent: 50m (was: 40m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794871=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794871 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 12:35 Start Date: 25/Jul/22 12:35 Worklog Time Spent: 10m Work Description: slfan1989 commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928833032 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -201,6 +201,7 @@ private void check() { iterkey).iterator(); final List toRemove = new ArrayList<>(); final List unhealthyDns = new ArrayList<>(); +boolean inValidState = false; while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem Review Comment: Single Line? Issue Time Tracking --- Worklog Id: (was: 794871) Time Spent: 40m (was: 0.5h) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794870=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794870 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 12:35 Start Date: 25/Jul/22 12:35 Worklog Time Spent: 10m Work Description: slfan1989 commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928832831 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -265,6 +266,7 @@ private void check() { // to track maintenance expiration. dnAdmin.setInMaintenance(dn); } else { + inValidState = true; Preconditions.checkState(false, Review Comment: indentation Issue Time Tracking --- Worklog Id: (was: 794870) Time Spent: 0.5h (was: 20m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794869=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794869 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 12:34 Start Date: 25/Jul/22 12:34 Worklog Time Spent: 10m Work Description: slfan1989 commented on code in PR #4626: URL: https://github.com/apache/hadoop/pull/4626#discussion_r928832301 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java: ## @@ -288,7 +290,11 @@ private void check() { // an invalid state. LOG.warn("DatanodeAdminMonitor caught exception when processing node " + "{}.", dn, e); Review Comment: Log Single Line? Issue Time Tracking --- Worklog Id: (was: 794869) Time Spent: 20m (was: 10m) > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16676) DatanodeAdminManager$Monitor reports a node as invalid continuously
[ https://issues.apache.org/jira/browse/HDFS-16676?focusedWorklogId=794862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794862 ] ASF GitHub Bot logged work on HDFS-16676: - Author: ASF GitHub Bot Created on: 25/Jul/22 12:01 Start Date: 25/Jul/22 12:01 Worklog Time Spent: 10m Work Description: ashutoshcipher opened a new pull request, #4626: URL: https://github.com/apache/hadoop/pull/4626 ### Description of PR DatanodeAdminManager$Monitor reports a node as invalid continuously JIRA - HDFS-16676 ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? Issue Time Tracking --- Worklog Id: (was: 794862) Remaining Estimate: 0h Time Spent: 10m > DatanodeAdminManager$Monitor reports a node as invalid continuously > --- > > Key: HDFS-16676 > URL: https://issues.apache.org/jira/browse/HDFS-16676 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.2.1 >Reporter: Prabhu Joseph >Assignee: groot >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > DatanodeAdminManager$Monitor reports a node as invalid continuously > {code} > 2022-07-21 06:54:38,562 WARN > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager > (DatanodeAdminMonitor-0): DatanodeAdminMonitor caught exception when > processing node 1.2.3.4:9866. > java.lang.IllegalStateException: Node 1.2.3.4:9866 is in an invalid state! > Invalid state: In Service 0 blocks are on this dn. > at > com.google.common.base.Preconditions.checkState(Preconditions.java:172) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.check(DatanodeAdminManager.java:601) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeAdminManager$Monitor.run(DatanodeAdminManager.java:504) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > {code} > A node goes into invalid state when stopDecommission sets the node to > IN-Service and misses to remove from pendingNodes queues (HDFS-16675). This > will be corrected only when user triggers startDecommission. Till then we > need not keep the invalid state node in the queue as anyway startDecommission > will add it back. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org