[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging
[ https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286879#comment-17286879 ] Bhavik Patel commented on HDFS-15835: - Thank you [~tasanuma] > Erasure coding: Add/remove logs for the better readability/debugging > > > Key: HDFS-15835 > URL: https://issues.apache.org/jira/browse/HDFS-15835 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15835.001.patch > > > * Unnecessary Namenode logs displaying for Disabling EC policies which are > already disabled. > * There is no info/debug are present for addPolicy, unsetPolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.
[ https://issues.apache.org/jira/browse/HDFS-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286871#comment-17286871 ] Jinglun commented on HDFS-15806: Hi [~ayushtkn], thanks your comments ! {quote}before this was there some kind of memory leak, or these threads were getting cleared later? {quote} In Xiaomi we use the dead node detector feature only for hbase. The HBase doesn't close the files system and the dfs client. So we haven't notice the leak before. Recently we found the dead node detector won't remove alive nodes from the dead node set, as described in HDFS-15809. So I started reviewing the whole feature and found this leak bug. {quote}Secondly, for the shutdown is there some specific order, or it is just random {quote} It is random. Most of the threads are connected by queue(the producer-consumer model). So the order of stopping the producer or the consumer won't be a problem. 1) The DeadNodeDetector thread is responsible for add nodes from _suspectAndDeadNodes_ set to _deadNodesProbeQueue_. 2) The _probeDeadNodesSchedulerThr_ is responsible for taking nodes from _deadNodesProbeQueue_ and __ submit probe tasks to _probeDeadNodesThreadPool_. 3) The _probeSuspectNodesSchedulerThr_ is responsible for taking nodes from _suspectNodesProbeQueue_ and submit probe tasks to _probeSuspectNodesThreadPool_. 4) All the probe tasks submit getDatanodeInfo rpc calls in the thread pool _rpcThreadPool_. Some other thoughts: the thread model is a little complicated and could be improved. For example I think we can do the rpc call at the probe task instead of submitting to rpcThreadPool. I need first figure out the purpose of the original design then may be start a new Jira for the thread improvement later. > DeadNodeDetector should close all the threads when it is closed. > > > Key: HDFS-15806 > URL: https://issues.apache.org/jira/browse/HDFS-15806 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15806.001.patch > > > The DeadNodeDetector doesn't close all the threads when it is closed. This > Jira trys to fix this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock
[ https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554604 ] ASF GitHub Bot logged work on HDFS-15781: - Author: ASF GitHub Bot Created on: 19/Feb/21 06:25 Start Date: 19/Feb/21 06:25 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2704: URL: https://github.com/apache/hadoop/pull/2704#issuecomment-781858678 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 34s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 32m 44s | | trunk passed | | +1 :green_heart: | compile | 1m 18s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 1m 14s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 20s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 56s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 54s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 26s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +0 :ok: | spotbugs | 21m 23s | | Both FindBugs and SpotBugs are enabled, using SpotBugs. | | +1 :green_heart: | spotbugs | 3m 7s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 9s | | the patch passed | | +1 :green_heart: | compile | 1m 11s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 1m 11s | | the patch passed | | +1 :green_heart: | compile | 1m 5s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 1m 5s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 59s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 13s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 12m 54s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 1m 23s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | -1 :x: | spotbugs | 3m 2s | [/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs cannot run computeBugHistory from spotbugs | _ Other Tests _ | | -1 :x: | unit | 196m 10s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 42s | | The patch does not generate ASF License warnings. | | | | 282m 5s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks | | | hadoop.hdfs.TestDecommissionWithStriped | | | hadoop.hdfs.server.namenode.TestDecommissioningStatus | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2704 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle | | uname | Linux 5aa96703b036 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 2970bd93f3e | | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | Multi-JDK versions
[jira] [Updated] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk intermittently
[ https://issues.apache.org/jira/browse/HDFS-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15840: Summary: TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk intermittently (was: TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk) > TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock > fails on trunk intermittently > > > Key: HDFS-15840 > URL: https://issues.apache.org/jira/browse/HDFS-15840 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Takanobu Asanuma >Priority: Major > > Found from HDFS-15835. > {quote}java.lang.AssertionError: expected:<10> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910) > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk
[ https://issues.apache.org/jira/browse/HDFS-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15840: Description: Found from HDFS-15835. {quote}java.lang.AssertionError: expected:<10> but was:<11> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910) {quote} was: {quote} java.lang.AssertionError: expected:<10> but was:<11> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {quote} > TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock > fails on trunk > - > > Key: HDFS-15840 > URL: https://issues.apache.org/jira/browse/HDFS-15840 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Takanobu Asanuma >Priority: Major > > Found from HDFS-15835. > {quote}java.lang.AssertionError: expected:<10> but was:<11> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at > org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910) > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk
Takanobu Asanuma created HDFS-15840: --- Summary: TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk Key: HDFS-15840 URL: https://issues.apache.org/jira/browse/HDFS-15840 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Takanobu Asanuma {quote} java.lang.AssertionError: expected:<10> but was:<11> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:834) at org.junit.Assert.assertEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:631) at org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging
[ https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286860#comment-17286860 ] Takanobu Asanuma commented on HDFS-15835: - [~bpatel] I added you to hadoop contributor role. You can assign yourself to JIRA next time. > Erasure coding: Add/remove logs for the better readability/debugging > > > Key: HDFS-15835 > URL: https://issues.apache.org/jira/browse/HDFS-15835 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15835.001.patch > > > * Unnecessary Namenode logs displaying for Disabling EC policies which are > already disabled. > * There is no info/debug are present for addPolicy, unsetPolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging
[ https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma reassigned HDFS-15835: --- Assignee: Bhavik Patel > Erasure coding: Add/remove logs for the better readability/debugging > > > Key: HDFS-15835 > URL: https://issues.apache.org/jira/browse/HDFS-15835 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs >Reporter: Bhavik Patel >Assignee: Bhavik Patel >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15835.001.patch > > > * Unnecessary Namenode logs displaying for Disabling EC policies which are > already disabled. > * There is no info/debug are present for addPolicy, unsetPolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging
[ https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15835: Fix Version/s: 3.4.0 3.3.1 Resolution: Fixed Status: Resolved (was: Patch Available) The failed test is not related. Committed to trunk and branch-3.3. Thanks for your contribution, [~bpatel]. > Erasure coding: Add/remove logs for the better readability/debugging > > > Key: HDFS-15835 > URL: https://issues.apache.org/jira/browse/HDFS-15835 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs >Reporter: Bhavik Patel >Priority: Minor > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15835.001.patch > > > * Unnecessary Namenode logs displaying for Disabling EC policies which are > already disabled. > * There is no info/debug are present for addPolicy, unsetPolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging
[ https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286853#comment-17286853 ] Takanobu Asanuma commented on HDFS-15835: - +1 on [^HDFS-15835.001.patch]. I will fix the checkstyle issue when committing it. > Erasure coding: Add/remove logs for the better readability/debugging > > > Key: HDFS-15835 > URL: https://issues.apache.org/jira/browse/HDFS-15835 > Project: Hadoop HDFS > Issue Type: Improvement > Components: erasure-coding, hdfs >Reporter: Bhavik Patel >Priority: Minor > Attachments: HDFS-15835.001.patch > > > * Unnecessary Namenode logs displaying for Disabling EC policies which are > already disabled. > * There is no info/debug are present for addPolicy, unsetPolicy -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.
[ https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286832#comment-17286832 ] Jinglun edited comment on HDFS-15809 at 2/19/21, 3:23 AM: -- Hi [~leosun08], thanks you comments. The solution in v01 introduces a new deduplicated queue. It won't accept duplicated nodes being added. The size of the queue is not fixed too so all the dead nodes could be added to the deduplicated queue. Thus the situation of duplicated dead nodes being repeatedly added to the probe queue won't happen anymore. The queue itself is deduplicated so we don't need to worry the queue size explosion. The size is no greater than the size of datanodes. Shuffle is a good idea and is a much simpler way. But I think the deduplicated way is more efficiency because there is no duplicated probe. Adjust the queue size won't fix the problem because the queue accept duplicated nodes. Even the queue size is 10 it could still be filled up with the first 30 nodes. was (Author: lijinglun): Hi [~leosun08], thanks you comments. The solution in v01 is to avoid adding duplicated dead nodes to the probe queue. So the queue won't be filled up with duplicated dead nodes. Shuffle is a good idea and is a much simpler way. I also agree with the shuffle way. > DeadNodeDetector doesn't remove live nodes from dead node set. > -- > > Key: HDFS-15809 > URL: https://issues.apache.org/jira/browse/HDFS-15809 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15809.001.patch > > > We found the dead node detector might never remove the alive nodes from the > dead node set in a big cluster. For example: > # 200 nodes are added to the dead node set by DeadNodeDetector. > # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the > deadNodesProbeQueue because the queue limited length is 100. > # The probe threads start working and probe 30 nodes. > # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead > node set and adds 30 nodes to the deadNodesProbeQueue. But the order is the > same as the last time. So the 30 nodes that has already been probed are added > to the queue again. > # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If > they are all dead then the live nodes behind them could never be recovered. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.
[ https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286832#comment-17286832 ] Jinglun commented on HDFS-15809: Hi [~leosun08], thanks you comments. The solution in v01 is to avoid adding duplicated dead nodes to the probe queue. So the queue won't be filled up with duplicated dead nodes. Shuffle is a good idea and is a much simpler way. I also agree with the shuffle way. > DeadNodeDetector doesn't remove live nodes from dead node set. > -- > > Key: HDFS-15809 > URL: https://issues.apache.org/jira/browse/HDFS-15809 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Jinglun >Assignee: Jinglun >Priority: Major > Attachments: HDFS-15809.001.patch > > > We found the dead node detector might never remove the alive nodes from the > dead node set in a big cluster. For example: > # 200 nodes are added to the dead node set by DeadNodeDetector. > # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the > deadNodesProbeQueue because the queue limited length is 100. > # The probe threads start working and probe 30 nodes. > # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead > node set and adds 30 nodes to the deadNodesProbeQueue. But the order is the > same as the last time. So the 30 nodes that has already been probed are added > to the queue again. > # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If > they are all dead then the live nodes behind them could never be recovered. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811 ] tomscut edited comment on HDFS-15808 at 2/19/21, 2:29 AM: -- Hi [~shv] . Thank you for your reply and suggestions. These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount of change, and then set the alarm based on that. We can combine those metrics with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to help further optimize performance. For example, we use Prometheus to store metrics and use the expression "delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" for monitoring. The following is a graph of monitoring data. [^lockLongHoldCount] was (Author: tomscut): Hi [~shv] . Thank you for your reply and suggestions. These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount of change, and then set the alarm based on that. We can combine those metrics with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to help further optimize performance. For example, we use Prometheus to store metrics and use the expression "delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" for monitoring. [^lockLongHoldCount] > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554555 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 19/Feb/21 01:57 Start Date: 19/Feb/21 01:57 Worklog Time Spent: 10m Work Description: tomscut commented on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781760195 > Just reposting my comment from the jira for visibility. > > The patch looks fine, but I doubt the metric will be useful in its current form. Monotonically increasing counter doesn't tell you much when plotted. Over time it just becomes an incredibly large number, hard to see its fluctuations. And you cannot set alerts if the threshold is exceeded often. > See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful. > I assume you need something like a rate. Hey @shvachko , thank you for your comments and suggestions. I replied to you in JIRA. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554555) Time Spent: 4.5h (was: 4h 20m) > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4.5h > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811 ] tomscut edited comment on HDFS-15808 at 2/19/21, 1:54 AM: -- Hi [~shv] . Thank you for your reply and suggestions. These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount of change, and then set the alarm based on that. We can combine those metrics with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to help further optimize performance. For example, we use Prometheus to store metrics and use the expression "delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" for monitoring. [^lockLongHoldCount] was (Author: tomscut): Hi [~shv] . Thank you for your reply and suggestions. These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount of change, and then set the alarm based on that. We can combine those metrics with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to help further optimize performance. For example, we use Prometheus to store metrics and use the expression "delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" for monitoring. [^lockLongHoldCount] > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4h 20m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tomscut updated HDFS-15808: --- Attachment: lockLongHoldCount > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4h 20m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811 ] tomscut commented on HDFS-15808: Hi [~shv] . Thank you for your reply and suggestions. These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount of change, and then set the alarm based on that. We can combine those metrics with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to help further optimize performance. For example, we use Prometheus to store metrics and use the expression "delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" for monitoring. [^lockLongHoldCount] > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Attachments: lockLongHoldCount > > Time Spent: 4h 20m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554548 ] ASF GitHub Bot logged work on HDFS-15830: - Author: ASF GitHub Bot Created on: 19/Feb/21 01:31 Start Date: 19/Feb/21 01:31 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #2694: URL: https://github.com/apache/hadoop/pull/2694#issuecomment-781751448 cherry-picked to branch-3.3 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554548) Time Spent: 1h 10m (was: 1h) > Support to make dfs.image.parallel.load reconfigurable > -- > > Key: HDFS-15830 > URL: https://issues.apache.org/jira/browse/HDFS-15830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > After HDFS-14617, loading fsimage improve a lot. > If something unexpected happens, we have to load old image to restart > namenode. > So advise that we make dfs.image.parallel.load reconfigurable, then we can > save new fsimage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei resolved HDFS-15830. Fix Version/s: 3.4.0 3.3.1 Resolution: Fixed > Support to make dfs.image.parallel.load reconfigurable > -- > > Key: HDFS-15830 > URL: https://issues.apache.org/jira/browse/HDFS-15830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > After HDFS-14617, loading fsimage improve a lot. > If something unexpected happens, we have to load old image to restart > namenode. > So advise that we make dfs.image.parallel.load reconfigurable, then we can > save new fsimage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554542 ] ASF GitHub Bot logged work on HDFS-15830: - Author: ASF GitHub Bot Created on: 19/Feb/21 01:07 Start Date: 19/Feb/21 01:07 Worklog Time Spent: 10m Work Description: ferhui commented on pull request #2694: URL: https://github.com/apache/hadoop/pull/2694#issuecomment-781742400 @sodonnel @dineshchitlangia Thanks for review ! merged to trunk This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554542) Time Spent: 50m (was: 40m) > Support to make dfs.image.parallel.load reconfigurable > -- > > Key: HDFS-15830 > URL: https://issues.apache.org/jira/browse/HDFS-15830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > After HDFS-14617, loading fsimage improve a lot. > If something unexpected happens, we have to load old image to restart > namenode. > So advise that we make dfs.image.parallel.load reconfigurable, then we can > save new fsimage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable
[ https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554544 ] ASF GitHub Bot logged work on HDFS-15830: - Author: ASF GitHub Bot Created on: 19/Feb/21 01:07 Start Date: 19/Feb/21 01:07 Worklog Time Spent: 10m Work Description: ferhui merged pull request #2694: URL: https://github.com/apache/hadoop/pull/2694 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554544) Time Spent: 1h (was: 50m) > Support to make dfs.image.parallel.load reconfigurable > -- > > Key: HDFS-15830 > URL: https://issues.apache.org/jira/browse/HDFS-15830 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Hui Fei >Assignee: Hui Fei >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > After HDFS-14617, loading fsimage improve a lot. > If something unexpected happens, we have to load old image to restart > namenode. > So advise that we make dfs.image.parallel.load reconfigurable, then we can > save new fsimage. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes
[ https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554538 ] ASF GitHub Bot logged work on HDFS-15785: - Author: ASF GitHub Bot Created on: 19/Feb/21 00:53 Start Date: 19/Feb/21 00:53 Worklog Time Spent: 10m Work Description: LeonGao91 edited a comment on pull request #2639: URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781734960 @fengnanli @goiri Could you help to take a look the change? (Somehow Jenkins is starting to use spotbug but it is not working..) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554538) Time Spent: 2h 20m (was: 2h 10m) > Datanode to support using DNS to resolve nameservices to IP addresses to get > list of namenodes > -- > > Key: HDFS-15785 > URL: https://issues.apache.org/jira/browse/HDFS-15785 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently as HDFS supports observers, multiple-standby and router, the > namenode hosts are changing frequently in large deployment, we can consider > supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to > reduce the need to update config frequently on all datanodes. In that case, > datanode and clients can use the same set of config as well. > Basically we can resolve the DNS and generate namenode for each IP behind it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes
[ https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554536 ] ASF GitHub Bot logged work on HDFS-15785: - Author: ASF GitHub Bot Created on: 19/Feb/21 00:48 Start Date: 19/Feb/21 00:48 Worklog Time Spent: 10m Work Description: LeonGao91 commented on pull request #2639: URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781734960 Somehow it is starting to use spotbug and it is not working (some changes on Yatus?).. @fengnanli @goiri Could you help to take a look? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554536) Time Spent: 2h 10m (was: 2h) > Datanode to support using DNS to resolve nameservices to IP addresses to get > list of namenodes > -- > > Key: HDFS-15785 > URL: https://issues.apache.org/jira/browse/HDFS-15785 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently as HDFS supports observers, multiple-standby and router, the > namenode hosts are changing frequently in large deployment, we can consider > supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to > reduce the need to update config frequently on all datanodes. In that case, > datanode and clients can use the same set of config as well. > Basically we can resolve the DNS and generate namenode for each IP behind it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock
[ https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554532 ] ASF GitHub Bot logged work on HDFS-15781: - Author: ASF GitHub Bot Created on: 19/Feb/21 00:24 Start Date: 19/Feb/21 00:24 Worklog Time Spent: 10m Work Description: LeonGao91 commented on a change in pull request #2704: URL: https://github.com/apache/hadoop/pull/2704#discussion_r578840318 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java ## @@ -188,6 +188,15 @@ @Metric MutableCounterLong packetsSlowWriteToDisk; @Metric MutableCounterLong packetsSlowWriteToOsCache; + @Metric("Number of replaceBlock ops between" + + " storage types on same host with local copy") + private MutableCounterLong replaceBlockOpOnSameHostWithCopy; + @Metric("Number of replaceBlock ops between" + + " storage types on same disk mount using hardlink") + private MutableCounterLong replaceBlockOpOnSameHostWithHardlink; Review comment: Yeah sounds good, OnSameHost can include OnSameMount actually. I will make the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554532) Time Spent: 50m (was: 40m) > Add metrics for how blocks are moved in replaceBlock > > > Key: HDFS-15781 > URL: https://issues.apache.org/jira/browse/HDFS-15781 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > We can add some metrics for to track how the blocks are being moved, to get > a sense of the locality of movements. > * How many blocks copied to local host? > * How many blocks moved to local disk thru hardlink? > * How many blocks are copied out of the host > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client
[ https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286779#comment-17286779 ] Hadoop QA commented on HDFS-15839: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 35s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} yetus {color} | {color:red} 0m 8s{color} | {color:red}{color} | {color:red} Unprocessed flag(s): --findbugs-strict-precheck {color} | \\ \\ || Subsystem || Report/Notes || | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/480/artifact/out/Dockerfile | | JIRA Issue | HDFS-15839 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13020660/HDFS-15839.001.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/480/console | | versions | git=2.25.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > RBF: Cannot get method setBalancerBandwidth on Router Client > > > Key: HDFS-15839 > URL: https://issues.apache.org/jira/browse/HDFS-15839 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-15839.001.patch, HDFS-15839.patch > > > When call setBalancerBandwidth, throw exeption, > {code:java} > 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR > router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method > setBalancerBandwidth with types [class java.lang.Long] from > ClientProtocoljava.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long) > at java.lang.Class.getDeclaredMethod(Class.java:2130) at > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client
[ https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286778#comment-17286778 ] Yang Yun commented on HDFS-15839: - Thanks [~ayushtkn] for your review. Update to HDFS-15839.001.patch to simplify test. > RBF: Cannot get method setBalancerBandwidth on Router Client > > > Key: HDFS-15839 > URL: https://issues.apache.org/jira/browse/HDFS-15839 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-15839.001.patch, HDFS-15839.patch > > > When call setBalancerBandwidth, throw exeption, > {code:java} > 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR > router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method > setBalancerBandwidth with types [class java.lang.Long] from > ClientProtocoljava.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long) > at java.lang.Class.getDeclaredMethod(Class.java:2130) at > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client
[ https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15839: Attachment: HDFS-15839.001.patch Status: Patch Available (was: Open) > RBF: Cannot get method setBalancerBandwidth on Router Client > > > Key: HDFS-15839 > URL: https://issues.apache.org/jira/browse/HDFS-15839 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-15839.001.patch, HDFS-15839.patch > > > When call setBalancerBandwidth, throw exeption, > {code:java} > 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR > router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method > setBalancerBandwidth with types [class java.lang.Long] from > ClientProtocoljava.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long) > at java.lang.Class.getDeclaredMethod(Class.java:2130) at > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client
[ https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Yun updated HDFS-15839: Status: Open (was: Patch Available) > RBF: Cannot get method setBalancerBandwidth on Router Client > > > Key: HDFS-15839 > URL: https://issues.apache.org/jira/browse/HDFS-15839 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-15839.patch > > > When call setBalancerBandwidth, throw exeption, > {code:java} > 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR > router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method > setBalancerBandwidth with types [class java.lang.Long] from > ClientProtocoljava.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long) > at java.lang.Class.getDeclaredMethod(Class.java:2130) at > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes
[ https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554520 ] ASF GitHub Bot logged work on HDFS-15785: - Author: ASF GitHub Bot Created on: 18/Feb/21 23:30 Start Date: 18/Feb/21 23:30 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2639: URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781703710 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | | 0m 0s | [test4tests](test4tests) | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 34s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 20m 24s | | trunk passed | | +1 :green_heart: | compile | 20m 37s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 17m 56s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | checkstyle | 3m 57s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 12s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 52s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 3m 3s | | trunk passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 4m 4s | | trunk passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +0 :ok: | spotbugs | 37m 5s | | Both FindBugs and SpotBugs are enabled, using SpotBugs. | | +1 :green_heart: | spotbugs | 8m 6s | | trunk passed | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 53s | | the patch passed | | +1 :green_heart: | compile | 19m 54s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 19m 54s | | root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1957 unchanged - 2 fixed = 1957 total (was 1959) | | +1 :green_heart: | compile | 17m 59s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | +1 :green_heart: | javac | 17m 59s | | root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 0 new + 1852 unchanged - 2 fixed = 1852 total (was 1854) | | +1 :green_heart: | checkstyle | 3m 55s | | root: The patch generated 0 new + 557 unchanged - 2 fixed = 557 total (was 559) | | +1 :green_heart: | mvnsite | 4m 6s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | xml | 0m 1s | | The patch has no ill-formed XML file. | | +1 :green_heart: | shadedclient | 13m 15s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 3m 3s | | the patch passed with JDK Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 4m 3s | | the patch passed with JDK Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 | | -1 :x: | spotbugs | 2m 18s | [/patch-spotbugs-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common cannot run computeBugHistory from spotbugs | | -1 :x: | spotbugs | 2m 36s | [/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.txt) | hadoop-hdfs-project/hadoop-hdfs-client cannot run computeBugHistory from spotbugs | | -1 :x: | spotbugs | 3m 14s | [/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs cannot run computeBugHistory from spotbugs | _ Other Tests _ | | +1 :green_heart: | unit | 17m 24s | | hadoop-common in the patch passed. | | +1 :green_heart:
[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock
[ https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554517 ] ASF GitHub Bot logged work on HDFS-15781: - Author: ASF GitHub Bot Created on: 18/Feb/21 23:19 Start Date: 18/Feb/21 23:19 Worklog Time Spent: 10m Work Description: Jing9 commented on a change in pull request #2704: URL: https://github.com/apache/hadoop/pull/2704#discussion_r578815508 ## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java ## @@ -188,6 +188,15 @@ @Metric MutableCounterLong packetsSlowWriteToDisk; @Metric MutableCounterLong packetsSlowWriteToOsCache; + @Metric("Number of replaceBlock ops between" + + " storage types on same host with local copy") + private MutableCounterLong replaceBlockOpOnSameHostWithCopy; + @Metric("Number of replaceBlock ops between" + + " storage types on same disk mount using hardlink") + private MutableCounterLong replaceBlockOpOnSameHostWithHardlink; Review comment: Both "withHardlink" and "withCopy" are our block movement implementation. If in the future we change our implementation these names may no long hold. How about we change the metric names to "OnSameHost" and "OnSameMount" ? But then we need to think more about their semantic meanings. Maybe "OnSameHost" also includes the "OnSameMount"... Thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554517) Time Spent: 40m (was: 0.5h) > Add metrics for how blocks are moved in replaceBlock > > > Key: HDFS-15781 > URL: https://issues.apache.org/jira/browse/HDFS-15781 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Leon Gao >Assignee: Leon Gao >Priority: Minor > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > We can add some metrics for to track how the blocks are being moved, to get > a sense of the locality of movements. > * How many blocks copied to local host? > * How many blocks moved to local disk thru hardlink? > * How many blocks are copied out of the host > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554448 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 18/Feb/21 20:18 Start Date: 18/Feb/21 20:18 Worklog Time Spent: 10m Work Description: shvachko edited a comment on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781609671 Just reposting my comment from the jira for visibility. The patch looks fine, but I doubt the metric will be useful in its current form. Monotonically increasing counter doesn't tell you much when plotted. Over time it just becomes an incredibly large number, hard to see its fluctuations. And you cannot set alerts if the threshold is exceeded often. See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful. I assume you need something like a rate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554448) Time Spent: 4h 20m (was: 4h 10m) > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time
[ https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554447 ] ASF GitHub Bot logged work on HDFS-15808: - Author: ASF GitHub Bot Created on: 18/Feb/21 20:18 Start Date: 18/Feb/21 20:18 Worklog Time Spent: 10m Work Description: shvachko commented on pull request #2668: URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781609671 Just reposting my comment on the jira for visibility. The patch looks fine, but I doubt the metric will be useful in its current form. Monotonically increasing counter doesn't tell you much when plotted. Over time it just becomes an incredibly large number, hard to see its fluctuations. And you cannot set alerts if the threshold is exceeded often. See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful. I assume you need something like a rate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 554447) Time Spent: 4h 10m (was: 4h) > Add metrics for FSNamesystem read/write lock hold long time > --- > > Key: HDFS-15808 > URL: https://issues.apache.org/jira/browse/HDFS-15808 > Project: Hadoop HDFS > Issue Type: Wish > Components: hdfs >Reporter: tomscut >Assignee: tomscut >Priority: Major > Labels: hdfs, lock, metrics, pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > > To monitor how often read/write locks exceed thresholds, we can add two > metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client
[ https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286568#comment-17286568 ] Ayush Saxena commented on HDFS-15839: - Thanx [~hadoop_yangyun] for the patch, the prod change looks good, But regarding the test, not sure what you want to do with the caller context there? I think that isn't required, just checking setBalancerBandwidth worked is fine. > RBF: Cannot get method setBalancerBandwidth on Router Client > > > Key: HDFS-15839 > URL: https://issues.apache.org/jira/browse/HDFS-15839 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Yang Yun >Assignee: Yang Yun >Priority: Major > Attachments: HDFS-15839.patch > > > When call setBalancerBandwidth, throw exeption, > {code:java} > 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR > router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method > setBalancerBandwidth with types [class java.lang.Long] from > ClientProtocoljava.lang.NoSuchMethodException: > org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long) > at java.lang.Class.getDeclaredMethod(Class.java:2130) at > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at > org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:422) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org