[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650104#comment-17650104 ] ASF GitHub Bot commented on HDFS-16689: --- hadoop-yetus commented on PR #4744: URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1360928620 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 40s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 4 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 41s | | trunk passed | | +1 :green_heart: | compile | 1m 26s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 20s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 32s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 28s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 30s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 47s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 16s | | the patch passed | | +1 :green_heart: | compile | 1m 17s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 1m 17s | | the patch passed | | +1 :green_heart: | compile | 1m 12s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 1m 12s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 52s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 260 unchanged - 1 fixed = 260 total (was 261) | | +1 :green_heart: | mvnsite | 1m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 49s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 26s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 13s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 302m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 50s | | The patch does not generate ASF License warnings. | | | | 408m 42s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4744 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c23c319f91b7 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 0d25fbee414a5cf318a3b9b9c831f5ae0aaf7d18 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4744/20/testReport/ | | Max. process+thread count | 3334 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | |
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650099#comment-17650099 ] ASF GitHub Bot commented on HDFS-16872: --- ChengbingLiu commented on code in PR #5246: URL: https://github.com/apache/hadoop/pull/5246#discussion_r1054026139 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/log/LogThrottlingHelper.java: ## @@ -259,6 +259,10 @@ public LogAction record(String recorderName, long currentTimeMs, currentLogs.put(recorderName, currentLog); } currentLog.recordValues(values); +if (currentTimeMs < lastLogTimestampMs) { + // Reset lastLogTimestampMs: this should only happen in tests + lastLogTimestampMs = Long.MIN_VALUE; +} Review Comment: I thought about this solution, but the problem is that we have to add `reset()` methods in `LogThrottlingHelper` as well as `FSEditLogLoader`. Will this be OK? > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16873) FileStatus compareTo does not specify ordering
[ https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-16873. Fix Version/s: 3.4.0 Resolution: Fixed > FileStatus compareTo does not specify ordering > -- > > Key: HDFS-16873 > URL: https://issues.apache.org/jira/browse/HDFS-16873 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: DDillon >Assignee: DDillon >Priority: Trivial > Labels: pull-request-available > Fix For: 3.4.0 > > > The Javadoc of FileStatus does not specify the field and manner in which > objects are ordered. In order to use the Comparable interface, this is > critical to understand to avoid making any assumptions. Inspection of code > showed that it is by path name quite quickly, but we shouldn't have to go > into code to confirm any obvious assumptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16873) FileStatus compareTo does not specify ordering
[ https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650052#comment-17650052 ] ASF GitHub Bot commented on HDFS-16873: --- jojochuang commented on PR #5219: URL: https://github.com/apache/hadoop/pull/5219#issuecomment-1360694288 Javadoc build error is unrelated. Merging it now. > FileStatus compareTo does not specify ordering > -- > > Key: HDFS-16873 > URL: https://issues.apache.org/jira/browse/HDFS-16873 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: DDillon >Assignee: DDillon >Priority: Trivial > > The Javadoc of FileStatus does not specify the field and manner in which > objects are ordered. In order to use the Comparable interface, this is > critical to understand to avoid making any assumptions. Inspection of code > showed that it is by path name quite quickly, but we shouldn't have to go > into code to confirm any obvious assumptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16873) FileStatus compareTo does not specify ordering
[ https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650053#comment-17650053 ] ASF GitHub Bot commented on HDFS-16873: --- jojochuang merged PR #5219: URL: https://github.com/apache/hadoop/pull/5219 > FileStatus compareTo does not specify ordering > -- > > Key: HDFS-16873 > URL: https://issues.apache.org/jira/browse/HDFS-16873 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: DDillon >Assignee: DDillon >Priority: Trivial > Labels: pull-request-available > > The Javadoc of FileStatus does not specify the field and manner in which > objects are ordered. In order to use the Comparable interface, this is > critical to understand to avoid making any assumptions. Inspection of code > showed that it is by path name quite quickly, but we shouldn't have to go > into code to confirm any obvious assumptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16873) FileStatus compareTo does not specify ordering
[ https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16873: -- Labels: pull-request-available (was: ) > FileStatus compareTo does not specify ordering > -- > > Key: HDFS-16873 > URL: https://issues.apache.org/jira/browse/HDFS-16873 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: DDillon >Assignee: DDillon >Priority: Trivial > Labels: pull-request-available > > The Javadoc of FileStatus does not specify the field and manner in which > objects are ordered. In order to use the Comparable interface, this is > critical to understand to avoid making any assumptions. Inspection of code > showed that it is by path name quite quickly, but we shouldn't have to go > into code to confirm any obvious assumptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16873) FileStatus compareTo does not specify ordering
[ https://issues.apache.org/jira/browse/HDFS-16873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reassigned HDFS-16873: -- Assignee: DDillon > FileStatus compareTo does not specify ordering > -- > > Key: HDFS-16873 > URL: https://issues.apache.org/jira/browse/HDFS-16873 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: DDillon >Assignee: DDillon >Priority: Trivial > > The Javadoc of FileStatus does not specify the field and manner in which > objects are ordered. In order to use the Comparable interface, this is > critical to understand to avoid making any assumptions. Inspection of code > showed that it is by path name quite quickly, but we shouldn't have to go > into code to confirm any obvious assumptions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16831) [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes every time
[ https://issues.apache.org/jira/browse/HDFS-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650042#comment-17650042 ] ASF GitHub Bot commented on HDFS-16831: --- ZanderXu commented on PR #5098: URL: https://github.com/apache/hadoop/pull/5098#issuecomment-1360641561 @simbadzina Master, can you help me review this PR when you are available? > [RBF SBN] GetNamenodesForNameserviceId should shuffle Observer NameNodes > every time > --- > > Key: HDFS-16831 > URL: https://issues.apache.org/jira/browse/HDFS-16831 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The method getNamenodesForNameserviceId in MembershipNamenodeResolver.class > should shuffle Observer NameNodes every time. The current logic will return > the cached list and will caused all of read requests are forwarding to the > first observer namenode. > > The related code as bellow: > {code:java} > @Override > public List getNamenodesForNameserviceId( > final String nsId, boolean listObserversFirst) throws IOException { > List ret = cacheNS.get(Pair.of(nsId, > listObserversFirst)); > if (ret != null) { > return ret; > } > ... > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650028#comment-17650028 ] ASF GitHub Bot commented on HDFS-16764: --- xkrogen commented on PR #4872: URL: https://github.com/apache/hadoop/pull/4872#issuecomment-1360540535 Merged to trunk and backported to `branch-3.3` (some minor conflicts in the imports). Thanks for the contribution @ZanderXu ! > ObserverNamenode handles addBlock rpc and throws a FileNotFoundException > - > > Key: HDFS-16764 > URL: https://issues.apache.org/jira/browse/HDFS-16764 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > ObserverNameNode currently can handle the addBlockLocation RPC, but it may > throw a FileNotFoundException when it contains stale txid. > * AddBlock is not a coordinated method, so Observer will not check the > statId. > * AddBlock does the validation with checkOperation(OperationCategory.READ) > So the observer can handle the addBlock rpc. If this observer cannot replay > the edit of create file, it will throw a FileNotFoundException during doing > validation. > The related code as follows: > {code:java} > checkOperation(OperationCategory.READ); > final FSPermissionChecker pc = getPermissionChecker(); > FSPermissionChecker.setOperationType(operationName); > readLock(); > try { > checkOperation(OperationCategory.READ); > r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName, > previous, onRetryBlock); > } finally { > readUnlock(operationName); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated HDFS-16764: --- Fix Version/s: 3.4.0 3.3.6 > ObserverNamenode handles addBlock rpc and throws a FileNotFoundException > - > > Key: HDFS-16764 > URL: https://issues.apache.org/jira/browse/HDFS-16764 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0, 3.3.6 > > > ObserverNameNode currently can handle the addBlockLocation RPC, but it may > throw a FileNotFoundException when it contains stale txid. > * AddBlock is not a coordinated method, so Observer will not check the > statId. > * AddBlock does the validation with checkOperation(OperationCategory.READ) > So the observer can handle the addBlock rpc. If this observer cannot replay > the edit of create file, it will throw a FileNotFoundException during doing > validation. > The related code as follows: > {code:java} > checkOperation(OperationCategory.READ); > final FSPermissionChecker pc = getPermissionChecker(); > FSPermissionChecker.setOperationType(operationName); > readLock(); > try { > checkOperation(OperationCategory.READ); > r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName, > previous, onRetryBlock); > } finally { > readUnlock(operationName); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException
[ https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650026#comment-17650026 ] ASF GitHub Bot commented on HDFS-16764: --- xkrogen merged PR #4872: URL: https://github.com/apache/hadoop/pull/4872 > ObserverNamenode handles addBlock rpc and throws a FileNotFoundException > - > > Key: HDFS-16764 > URL: https://issues.apache.org/jira/browse/HDFS-16764 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > ObserverNameNode currently can handle the addBlockLocation RPC, but it may > throw a FileNotFoundException when it contains stale txid. > * AddBlock is not a coordinated method, so Observer will not check the > statId. > * AddBlock does the validation with checkOperation(OperationCategory.READ) > So the observer can handle the addBlock rpc. If this observer cannot replay > the edit of create file, it will throw a FileNotFoundException during doing > validation. > The related code as follows: > {code:java} > checkOperation(OperationCategory.READ); > final FSPermissionChecker pc = getPermissionChecker(); > FSPermissionChecker.setOperationType(operationName); > readLock(); > try { > checkOperation(OperationCategory.READ); > r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName, > previous, onRetryBlock); > } finally { > readUnlock(operationName); > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650025#comment-17650025 ] ASF GitHub Bot commented on HDFS-16872: --- xkrogen commented on code in PR #5246: URL: https://github.com/apache/hadoop/pull/5246#discussion_r1053850133 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/log/LogThrottlingHelper.java: ## @@ -259,6 +259,10 @@ public LogAction record(String recorderName, long currentTimeMs, currentLogs.put(recorderName, currentLog); } currentLog.recordValues(values); +if (currentTimeMs < lastLogTimestampMs) { + // Reset lastLogTimestampMs: this should only happen in tests + lastLogTimestampMs = Long.MIN_VALUE; +} Review Comment: This is a bit awkward. How about instead we add a new method like `reset()` that clears all of the state and call this in the `beforeEach()` for the test? > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650024#comment-17650024 ] ASF GitHub Bot commented on HDFS-16872: --- xkrogen commented on code in PR #5246: URL: https://github.com/apache/hadoop/pull/5246#discussion_r1053850133 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/log/LogThrottlingHelper.java: ## @@ -259,6 +259,10 @@ public LogAction record(String recorderName, long currentTimeMs, currentLogs.put(recorderName, currentLog); } currentLog.recordValues(values); +if (currentTimeMs < lastLogTimestampMs) { + // Reset lastLogTimestampMs: this should only happen in tests + lastLogTimestampMs = Long.MIN_VALUE; +} Review Comment: This is a bit awkward. How about instead we add a new method like `reset()` that clears all of the state? > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16871) DiskBalancer process may throws IllegalArgumentException when the target DataNode has capital letter in hostname
[ https://issues.apache.org/jira/browse/HDFS-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang resolved HDFS-16871. Fix Version/s: 3.4.0 Resolution: Fixed > DiskBalancer process may throws IllegalArgumentException when the target > DataNode has capital letter in hostname > > > Key: HDFS-16871 > URL: https://issues.apache.org/jira/browse/HDFS-16871 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: screenshot-1.png, screenshot-2.png > > > DiskBalancer process read DataNode hostname as lowercase letters, > !screenshot-1.png! > but there is no letter case transform when getNodeByName. > !screenshot-2.png! > For a DataNode with lowercase hostname. everything is ok. > But for a DataNode with uppercase hostname, when Balancer process try to > migrate on it, there will be a IllegalArgumentException thrown as below, > {code:java} > 2022-10-09 16:15:26,631 ERROR tools.DiskBalancerCLI: > java.lang.IllegalArgumentException: Unable to find the specified node. > node-group-1YlRf0002 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16871) DiskBalancer process may throws IllegalArgumentException when the target DataNode has capital letter in hostname
[ https://issues.apache.org/jira/browse/HDFS-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17650022#comment-17650022 ] ASF GitHub Bot commented on HDFS-16871: --- jojochuang merged PR #5240: URL: https://github.com/apache/hadoop/pull/5240 > DiskBalancer process may throws IllegalArgumentException when the target > DataNode has capital letter in hostname > > > Key: HDFS-16871 > URL: https://issues.apache.org/jira/browse/HDFS-16871 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png > > > DiskBalancer process read DataNode hostname as lowercase letters, > !screenshot-1.png! > but there is no letter case transform when getNodeByName. > !screenshot-2.png! > For a DataNode with lowercase hostname. everything is ok. > But for a DataNode with uppercase hostname, when Balancer process try to > migrate on it, there will be a IllegalArgumentException thrown as below, > {code:java} > 2022-10-09 16:15:26,631 ERROR tools.DiskBalancerCLI: > java.lang.IllegalArgumentException: Unable to find the specified node. > node-group-1YlRf0002 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649945#comment-17649945 ] ASF GitHub Bot commented on HDFS-16872: --- hadoop-yetus commented on PR #5246: URL: https://github.com/apache/hadoop/pull/5246#issuecomment-1360046069 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 2m 45s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 13s | | Maven dependency ordering for branch | | -1 :x: | mvninstall | 17m 43s | [/branch-mvninstall-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5246/1/artifact/out/branch-mvninstall-root.txt) | root in trunk failed. | | +1 :green_heart: | compile | 28m 56s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 25m 44s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 4m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 56s | | trunk passed | | -1 :x: | javadoc | 1m 16s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5246/1/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 38s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 6m 32s | | trunk passed | | +1 :green_heart: | shadedclient | 27m 14s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 27s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 30s | | the patch passed | | +1 :green_heart: | compile | 26m 21s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 26m 21s | | the patch passed | | +1 :green_heart: | compile | 24m 36s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 24m 36s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 5m 35s | | the patch passed | | +1 :green_heart: | mvnsite | 5m 8s | | the patch passed | | -1 :x: | javadoc | 1m 9s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5246/1/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 2m 32s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 7m 12s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 35s | | hadoop-common in the patch passed. | | -1 :x: | unit | 578m 54s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5246/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 27s | | The patch does not generate ASF License warnings. | | | | 836m 51s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestReconstructStripedFileWithValidator | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand | | | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | |
[jira] [Commented] (HDFS-16871) DiskBalancer process may throws IllegalArgumentException when the target DataNode has capital letter in hostname
[ https://issues.apache.org/jira/browse/HDFS-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649647#comment-17649647 ] ASF GitHub Bot commented on HDFS-16871: --- slfan1989 commented on PR #5240: URL: https://github.com/apache/hadoop/pull/5240#issuecomment-1359064531 > Could you pls help to review this PR? LGTM +1. We'd better trigger a compile again. > DiskBalancer process may throws IllegalArgumentException when the target > DataNode has capital letter in hostname > > > Key: HDFS-16871 > URL: https://issues.apache.org/jira/browse/HDFS-16871 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png > > > DiskBalancer process read DataNode hostname as lowercase letters, > !screenshot-1.png! > but there is no letter case transform when getNodeByName. > !screenshot-2.png! > For a DataNode with lowercase hostname. everything is ok. > But for a DataNode with uppercase hostname, when Balancer process try to > migrate on it, there will be a IllegalArgumentException thrown as below, > {code:java} > 2022-10-09 16:15:26,631 ERROR tools.DiskBalancerCLI: > java.lang.IllegalArgumentException: Unable to find the specified node. > node-group-1YlRf0002 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16871) DiskBalancer process may throws IllegalArgumentException when the target DataNode has capital letter in hostname
[ https://issues.apache.org/jira/browse/HDFS-16871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649625#comment-17649625 ] ASF GitHub Bot commented on HDFS-16871: --- Daniel-009497 commented on PR #5240: URL: https://github.com/apache/hadoop/pull/5240#issuecomment-1358983399 @slfan1989 Could you pls help to review this PR? > DiskBalancer process may throws IllegalArgumentException when the target > DataNode has capital letter in hostname > > > Key: HDFS-16871 > URL: https://issues.apache.org/jira/browse/HDFS-16871 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Daniel Ma >Assignee: Daniel Ma >Priority: Major > Labels: pull-request-available > Attachments: screenshot-1.png, screenshot-2.png > > > DiskBalancer process read DataNode hostname as lowercase letters, > !screenshot-1.png! > but there is no letter case transform when getNodeByName. > !screenshot-2.png! > For a DataNode with lowercase hostname. everything is ok. > But for a DataNode with uppercase hostname, when Balancer process try to > migrate on it, there will be a IllegalArgumentException thrown as below, > {code:java} > 2022-10-09 16:15:26,631 ERROR tools.DiskBalancerCLI: > java.lang.IllegalArgumentException: Unable to find the specified node. > node-group-1YlRf0002 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org