[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging

2021-02-18 Thread Bhavik Patel (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286879#comment-17286879
 ] 

Bhavik Patel commented on HDFS-15835:
-

Thank you [~tasanuma]

> Erasure coding: Add/remove logs for the better readability/debugging
> 
>
> Key: HDFS-15835
> URL: https://issues.apache.org/jira/browse/HDFS-15835
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15835.001.patch
>
>
> * Unnecessary Namenode logs displaying for Disabling EC policies which are 
> already disabled.
> * There is no info/debug are present for addPolicy, unsetPolicy 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15806) DeadNodeDetector should close all the threads when it is closed.

2021-02-18 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286871#comment-17286871
 ] 

Jinglun commented on HDFS-15806:


Hi [~ayushtkn], thanks your comments ! 
{quote}before this was there some kind of memory leak, or these threads were 
getting cleared later?
{quote}
In Xiaomi we use the dead node detector feature only for hbase. The HBase 
doesn't close the files system and the dfs client. So we haven't notice the 
leak before.  Recently we found the dead node detector won't remove alive nodes 
from the dead node set, as described in HDFS-15809. So I started reviewing the 
whole feature and found this leak bug.
{quote}Secondly, for the shutdown is there some specific order, or it is just 
random
{quote}
It is random. Most of the threads are connected by queue(the producer-consumer 
model). So the order of  stopping the producer or the consumer won't be a 
problem.

1) The DeadNodeDetector thread is responsible for add nodes from 
_suspectAndDeadNodes_ set to _deadNodesProbeQueue_.

2) The _probeDeadNodesSchedulerThr_ is responsible for taking nodes from 
_deadNodesProbeQueue_ and __ submit probe tasks to _probeDeadNodesThreadPool_. 
3) The _probeSuspectNodesSchedulerThr_ is responsible for taking nodes from 
_suspectNodesProbeQueue_ and submit probe tasks to 
_probeSuspectNodesThreadPool_.

4) All the probe tasks submit getDatanodeInfo rpc calls in the thread pool 
_rpcThreadPool_.

 

Some other thoughts: the thread model is a little complicated and could be 
improved. For example I think we can do the rpc call at the probe task instead 
of submitting to rpcThreadPool. I need first figure out the purpose of the 
original design then may be start a new Jira for the thread improvement later.

> DeadNodeDetector should close all the threads when it is closed.
> 
>
> Key: HDFS-15806
> URL: https://issues.apache.org/jira/browse/HDFS-15806
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15806.001.patch
>
>
> The DeadNodeDetector doesn't close all the threads when it is closed. This 
> Jira trys to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554604=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554604
 ]

ASF GitHub Bot logged work on HDFS-15781:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 06:25
Start Date: 19/Feb/21 06:25
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2704:
URL: https://github.com/apache/hadoop/pull/2704#issuecomment-781858678


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  32m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |   1m 14s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  15m 56s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +0 :ok: |  spotbugs  |  21m 23s |  |  Both FindBugs and SpotBugs are 
enabled, using SpotBugs.  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 11s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |   1m 11s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  shadedclient  |  12m 54s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 48s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | -1 :x: |  spotbugs  |   3m  2s | 
[/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs cannot run computeBugHistory from spotbugs  
|
    _ Other Tests _ |
   | -1 :x: |  unit  | 196m 10s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 42s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 282m  5s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.namenode.TestAddOverReplicatedStripedBlocks |
   |   | hadoop.hdfs.TestDecommissionWithStriped |
   |   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2704/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/2704 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle |
   | uname | Linux 5aa96703b036 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 
23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 2970bd93f3e |
   | Default Java | Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 |
   | Multi-JDK versions 

[jira] [Updated] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk intermittently

2021-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15840:

Summary: 
TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock 
fails on trunk intermittently  (was: 
TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock 
fails on trunk)

> TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock 
> fails on trunk intermittently
> 
>
> Key: HDFS-15840
> URL: https://issues.apache.org/jira/browse/HDFS-15840
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takanobu Asanuma
>Priority: Major
>
> Found from HDFS-15835.
> {quote}java.lang.AssertionError: expected:<10> but was:<11>
>  at org.junit.Assert.fail(Assert.java:88)
>  at org.junit.Assert.failNotEquals(Assert.java:834)
>  at org.junit.Assert.assertEquals(Assert.java:645)
>  at org.junit.Assert.assertEquals(Assert.java:631)
>  at 
> org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk

2021-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15840:

Description: 
Found from HDFS-15835.
{quote}java.lang.AssertionError: expected:<10> but was:<11>
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:834)
 at org.junit.Assert.assertEquals(Assert.java:645)
 at org.junit.Assert.assertEquals(Assert.java:631)
 at 
org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)
{quote}

  was:
{quote}
java.lang.AssertionError: expected:<10> but was:<11>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{quote}


> TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock 
> fails on trunk
> -
>
> Key: HDFS-15840
> URL: https://issues.apache.org/jira/browse/HDFS-15840
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takanobu Asanuma
>Priority: Major
>
> Found from HDFS-15835.
> {quote}java.lang.AssertionError: expected:<10> but was:<11>
>  at org.junit.Assert.fail(Assert.java:88)
>  at org.junit.Assert.failNotEquals(Assert.java:834)
>  at org.junit.Assert.assertEquals(Assert.java:645)
>  at org.junit.Assert.assertEquals(Assert.java:631)
>  at 
> org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15840) TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock fails on trunk

2021-02-18 Thread Takanobu Asanuma (Jira)
Takanobu Asanuma created HDFS-15840:
---

 Summary: 
TestDecommissionWithStripedBackoffMonitor#testDecommissionWithMissingBlock 
fails on trunk
 Key: HDFS-15840
 URL: https://issues.apache.org/jira/browse/HDFS-15840
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Takanobu Asanuma


{quote}
java.lang.AssertionError: expected:<10> but was:<11>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:834)
at org.junit.Assert.assertEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:631)
at 
org.apache.hadoop.hdfs.TestDecommissionWithStriped.testDecommissionWithMissingBlock(TestDecommissionWithStriped.java:910)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging

2021-02-18 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286860#comment-17286860
 ] 

Takanobu Asanuma commented on HDFS-15835:
-

[~bpatel]  I added you to hadoop contributor role. You can assign yourself to 
JIRA next time. 

> Erasure coding: Add/remove logs for the better readability/debugging
> 
>
> Key: HDFS-15835
> URL: https://issues.apache.org/jira/browse/HDFS-15835
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15835.001.patch
>
>
> * Unnecessary Namenode logs displaying for Disabling EC policies which are 
> already disabled.
> * There is no info/debug are present for addPolicy, unsetPolicy 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging

2021-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma reassigned HDFS-15835:
---

Assignee: Bhavik Patel

> Erasure coding: Add/remove logs for the better readability/debugging
> 
>
> Key: HDFS-15835
> URL: https://issues.apache.org/jira/browse/HDFS-15835
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs
>Reporter: Bhavik Patel
>Assignee: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15835.001.patch
>
>
> * Unnecessary Namenode logs displaying for Disabling EC policies which are 
> already disabled.
> * There is no info/debug are present for addPolicy, unsetPolicy 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging

2021-02-18 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma updated HDFS-15835:

Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

The failed test is not related.

Committed to trunk and branch-3.3.  Thanks for your contribution, [~bpatel].

> Erasure coding: Add/remove logs for the better readability/debugging
> 
>
> Key: HDFS-15835
> URL: https://issues.apache.org/jira/browse/HDFS-15835
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs
>Reporter: Bhavik Patel
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15835.001.patch
>
>
> * Unnecessary Namenode logs displaying for Disabling EC policies which are 
> already disabled.
> * There is no info/debug are present for addPolicy, unsetPolicy 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15835) Erasure coding: Add/remove logs for the better readability/debugging

2021-02-18 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286853#comment-17286853
 ] 

Takanobu Asanuma commented on HDFS-15835:
-

+1 on [^HDFS-15835.001.patch]. I will fix the checkstyle issue when committing 
it.

> Erasure coding: Add/remove logs for the better readability/debugging
> 
>
> Key: HDFS-15835
> URL: https://issues.apache.org/jira/browse/HDFS-15835
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: erasure-coding, hdfs
>Reporter: Bhavik Patel
>Priority: Minor
> Attachments: HDFS-15835.001.patch
>
>
> * Unnecessary Namenode logs displaying for Disabling EC policies which are 
> already disabled.
> * There is no info/debug are present for addPolicy, unsetPolicy 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.

2021-02-18 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286832#comment-17286832
 ] 

Jinglun edited comment on HDFS-15809 at 2/19/21, 3:23 AM:
--

Hi [~leosun08], thanks you comments. The solution in v01 introduces a new 
deduplicated queue. It won't accept duplicated nodes being added. The size of 
the queue is not fixed too so all the dead nodes could be added to the 
deduplicated queue. Thus the situation of duplicated dead nodes being 
repeatedly added to the probe queue won't happen anymore.

The queue itself is deduplicated so we don't need to worry the queue size 
explosion. The size is no greater than the size of datanodes.

Shuffle is a good idea and is a much simpler way. But I think the deduplicated 
way is more efficiency because there is no duplicated probe.

Adjust the queue size won't fix the problem because the queue accept duplicated 
nodes. Even the queue size is 10 it could still be filled up with the first 
30 nodes.

 


was (Author: lijinglun):
Hi [~leosun08], thanks you comments. The solution in v01 is to avoid adding 
duplicated dead nodes to the probe queue. So the queue won't be filled up with 
duplicated dead nodes.

Shuffle is a good idea and is a much simpler way. I also agree with the shuffle 
way.

 

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --
>
> Key: HDFS-15809
> URL: https://issues.apache.org/jira/browse/HDFS-15809
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15809.001.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15809) DeadNodeDetector doesn't remove live nodes from dead node set.

2021-02-18 Thread Jinglun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286832#comment-17286832
 ] 

Jinglun commented on HDFS-15809:


Hi [~leosun08], thanks you comments. The solution in v01 is to avoid adding 
duplicated dead nodes to the probe queue. So the queue won't be filled up with 
duplicated dead nodes.

Shuffle is a good idea and is a much simpler way. I also agree with the shuffle 
way.

 

> DeadNodeDetector doesn't remove live nodes from dead node set.
> --
>
> Key: HDFS-15809
> URL: https://issues.apache.org/jira/browse/HDFS-15809
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15809.001.patch
>
>
> We found the dead node detector might never remove the alive nodes from the 
> dead node set in a big cluster. For example:
>  # 200 nodes are added to the dead node set by DeadNodeDetector.
>  # DeadNodeDetector#checkDeadNodes() adds 100 nodes to the 
> deadNodesProbeQueue because the queue limited length is 100.
>  # The probe threads start working and probe 30 nodes.
>  # DeadNodeDetector#checkDeadNodes() is scheduled again. It iterates the dead 
> node set  and adds 30 nodes to the deadNodesProbeQueue. But the order is the 
> same as the last time. So the 30 nodes that has already been probed are added 
> to the queue again.
>  # Repeat 3 and 4. But we always add the first 30 nodes from the dead set. If 
> they are all dead then the live nodes behind them could never be recovered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811
 ] 

tomscut edited comment on HDFS-15808 at 2/19/21, 2:29 AM:
--

Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

The following is a graph of monitoring data.

[^lockLongHoldCount]


was (Author: tomscut):
Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

[^lockLongHoldCount]

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554555
 ]

ASF GitHub Bot logged work on HDFS-15808:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 01:57
Start Date: 19/Feb/21 01:57
Worklog Time Spent: 10m 
  Work Description: tomscut commented on pull request #2668:
URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781760195


   > Just reposting my comment from the jira for visibility.
   > 
   > The patch looks fine, but I doubt the metric will be useful in its current 
form. Monotonically increasing counter doesn't tell you much when plotted. Over 
time it just becomes an incredibly large number, hard to see its fluctuations. 
And you cannot set alerts if the threshold is exceeded often.
   > See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful.
   > I assume you need something like a rate.
   
   Hey @shvachko , thank you for your comments and suggestions. I replied to 
you in JIRA.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554555)
Time Spent: 4.5h  (was: 4h 20m)

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811
 ] 

tomscut edited comment on HDFS-15808 at 2/19/21, 1:54 AM:
--

Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

[^lockLongHoldCount]


was (Author: tomscut):
Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

 

[^lockLongHoldCount]

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread tomscut (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tomscut updated HDFS-15808:
---
Attachment: lockLongHoldCount

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread tomscut (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286811#comment-17286811
 ] 

tomscut commented on HDFS-15808:


Hi [~shv] . Thank you for your reply and suggestions.

These two metrics are indeed incrementing, similar to RpcQueueTimeNumOps and 
RpcProcessingTimeNumOps. But we can calculate the rate of change or the amount 
of change, and then set the alarm based on that. We can combine those metrics 
with lock-detailed-metrics(https://issues.apache.org/jira/browse/HDFS-10872) to 
help further optimize performance.

For example, we use Prometheus to store metrics and use the expression 
"delta(hadoop_fsNamesystem_writeLockLongholdCount\{instance=~"$hosts"})[1m]" 
for monitoring.

 

[^lockLongHoldCount]

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
> Attachments: lockLongHoldCount
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554548
 ]

ASF GitHub Bot logged work on HDFS-15830:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 01:31
Start Date: 19/Feb/21 01:31
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #2694:
URL: https://github.com/apache/hadoop/pull/2694#issuecomment-781751448


   cherry-picked to branch-3.3



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554548)
Time Spent: 1h 10m  (was: 1h)

> Support to make dfs.image.parallel.load reconfigurable
> --
>
> Key: HDFS-15830
> URL: https://issues.apache.org/jira/browse/HDFS-15830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> After HDFS-14617, loading fsimage improve a lot.
> If something unexpected happens, we have to load old image to restart 
> namenode.
> So advise  that we make dfs.image.parallel.load reconfigurable, then we can 
> save new fsimage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable

2021-02-18 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-15830.

Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed

> Support to make dfs.image.parallel.load reconfigurable
> --
>
> Key: HDFS-15830
> URL: https://issues.apache.org/jira/browse/HDFS-15830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After HDFS-14617, loading fsimage improve a lot.
> If something unexpected happens, we have to load old image to restart 
> namenode.
> So advise  that we make dfs.image.parallel.load reconfigurable, then we can 
> save new fsimage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554542
 ]

ASF GitHub Bot logged work on HDFS-15830:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 01:07
Start Date: 19/Feb/21 01:07
Worklog Time Spent: 10m 
  Work Description: ferhui commented on pull request #2694:
URL: https://github.com/apache/hadoop/pull/2694#issuecomment-781742400


   @sodonnel @dineshchitlangia Thanks for review !
   merged to trunk



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554542)
Time Spent: 50m  (was: 40m)

> Support to make dfs.image.parallel.load reconfigurable
> --
>
> Key: HDFS-15830
> URL: https://issues.apache.org/jira/browse/HDFS-15830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> After HDFS-14617, loading fsimage improve a lot.
> If something unexpected happens, we have to load old image to restart 
> namenode.
> So advise  that we make dfs.image.parallel.load reconfigurable, then we can 
> save new fsimage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15830) Support to make dfs.image.parallel.load reconfigurable

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15830?focusedWorklogId=554544=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554544
 ]

ASF GitHub Bot logged work on HDFS-15830:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 01:07
Start Date: 19/Feb/21 01:07
Worklog Time Spent: 10m 
  Work Description: ferhui merged pull request #2694:
URL: https://github.com/apache/hadoop/pull/2694


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554544)
Time Spent: 1h  (was: 50m)

> Support to make dfs.image.parallel.load reconfigurable
> --
>
> Key: HDFS-15830
> URL: https://issues.apache.org/jira/browse/HDFS-15830
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Hui Fei
>Assignee: Hui Fei
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> After HDFS-14617, loading fsimage improve a lot.
> If something unexpected happens, we have to load old image to restart 
> namenode.
> So advise  that we make dfs.image.parallel.load reconfigurable, then we can 
> save new fsimage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554538
 ]

ASF GitHub Bot logged work on HDFS-15785:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 00:53
Start Date: 19/Feb/21 00:53
Worklog Time Spent: 10m 
  Work Description: LeonGao91 edited a comment on pull request #2639:
URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781734960


   @fengnanli @goiri Could you help to take a look the change?
   
   (Somehow Jenkins is starting to use spotbug but it is not working..)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554538)
Time Spent: 2h 20m  (was: 2h 10m)

> Datanode to support using DNS to resolve nameservices to IP addresses to get 
> list of namenodes
> --
>
> Key: HDFS-15785
> URL: https://issues.apache.org/jira/browse/HDFS-15785
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently as HDFS supports observers, multiple-standby and router, the 
> namenode hosts are changing frequently in large deployment, we can consider 
> supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to 
> reduce the need to update config frequently on all datanodes. In that case, 
> datanode and clients can use the same set of config as well.
> Basically we can resolve the DNS and generate namenode for each IP behind it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554536
 ]

ASF GitHub Bot logged work on HDFS-15785:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 00:48
Start Date: 19/Feb/21 00:48
Worklog Time Spent: 10m 
  Work Description: LeonGao91 commented on pull request #2639:
URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781734960


   Somehow it is starting to use spotbug and it is not working (some changes on 
Yatus?)..
   
   @fengnanli @goiri Could you help to take a look?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554536)
Time Spent: 2h 10m  (was: 2h)

> Datanode to support using DNS to resolve nameservices to IP addresses to get 
> list of namenodes
> --
>
> Key: HDFS-15785
> URL: https://issues.apache.org/jira/browse/HDFS-15785
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently as HDFS supports observers, multiple-standby and router, the 
> namenode hosts are changing frequently in large deployment, we can consider 
> supporting https://issues.apache.org/jira/browse/HDFS-14118 on datanode to 
> reduce the need to update config frequently on all datanodes. In that case, 
> datanode and clients can use the same set of config as well.
> Basically we can resolve the DNS and generate namenode for each IP behind it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554532
 ]

ASF GitHub Bot logged work on HDFS-15781:
-

Author: ASF GitHub Bot
Created on: 19/Feb/21 00:24
Start Date: 19/Feb/21 00:24
Worklog Time Spent: 10m 
  Work Description: LeonGao91 commented on a change in pull request #2704:
URL: https://github.com/apache/hadoop/pull/2704#discussion_r578840318



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
##
@@ -188,6 +188,15 @@
   @Metric MutableCounterLong packetsSlowWriteToDisk;
   @Metric MutableCounterLong packetsSlowWriteToOsCache;
 
+  @Metric("Number of replaceBlock ops between" +
+  " storage types on same host with local copy")
+  private MutableCounterLong replaceBlockOpOnSameHostWithCopy;
+  @Metric("Number of replaceBlock ops between" +
+  " storage types on same disk mount using hardlink")
+  private MutableCounterLong replaceBlockOpOnSameHostWithHardlink;

Review comment:
   Yeah sounds good, OnSameHost can include OnSameMount actually. I will 
make the change.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554532)
Time Spent: 50m  (was: 40m)

> Add metrics for how blocks are moved in replaceBlock
> 
>
> Key: HDFS-15781
> URL: https://issues.apache.org/jira/browse/HDFS-15781
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We can add some metrics for  to track how the blocks are being moved, to get 
> a sense of the locality of movements.
>  * How many blocks copied to local host?
>  * How many blocks moved to local disk thru hardlink?
>  * How many blocks are copied out of the host
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client

2021-02-18 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286779#comment-17286779
 ] 

Hadoop QA commented on HDFS-15839:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
35s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} yetus {color} | {color:red}  0m  8s{color} 
| {color:red}{color} | {color:red} Unprocessed flag(s): 
--findbugs-strict-precheck {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/480/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15839 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13020660/HDFS-15839.001.patch |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/480/console |
| versions | git=2.25.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> RBF: Cannot get method setBalancerBandwidth on Router Client
> 
>
> Key: HDFS-15839
> URL: https://issues.apache.org/jira/browse/HDFS-15839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15839.001.patch, HDFS-15839.patch
>
>
> When call setBalancerBandwidth, throw exeption,
> {code:java}
> 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR 
> router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method 
> setBalancerBandwidth with types [class java.lang.Long] from 
> ClientProtocoljava.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long)
>  at java.lang.Class.getDeclaredMethod(Class.java:2130) at 
> org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client

2021-02-18 Thread Yang Yun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286778#comment-17286778
 ] 

Yang Yun commented on HDFS-15839:
-

Thanks [~ayushtkn] for your review.

Update to HDFS-15839.001.patch to simplify test.

> RBF: Cannot get method setBalancerBandwidth on Router Client
> 
>
> Key: HDFS-15839
> URL: https://issues.apache.org/jira/browse/HDFS-15839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15839.001.patch, HDFS-15839.patch
>
>
> When call setBalancerBandwidth, throw exeption,
> {code:java}
> 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR 
> router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method 
> setBalancerBandwidth with types [class java.lang.Long] from 
> ClientProtocoljava.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long)
>  at java.lang.Class.getDeclaredMethod(Class.java:2130) at 
> org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client

2021-02-18 Thread Yang Yun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15839:

Attachment: HDFS-15839.001.patch
Status: Patch Available  (was: Open)

> RBF: Cannot get method setBalancerBandwidth on Router Client
> 
>
> Key: HDFS-15839
> URL: https://issues.apache.org/jira/browse/HDFS-15839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15839.001.patch, HDFS-15839.patch
>
>
> When call setBalancerBandwidth, throw exeption,
> {code:java}
> 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR 
> router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method 
> setBalancerBandwidth with types [class java.lang.Long] from 
> ClientProtocoljava.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long)
>  at java.lang.Class.getDeclaredMethod(Class.java:2130) at 
> org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client

2021-02-18 Thread Yang Yun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Yun updated HDFS-15839:

Status: Open  (was: Patch Available)

> RBF: Cannot get method setBalancerBandwidth on Router Client
> 
>
> Key: HDFS-15839
> URL: https://issues.apache.org/jira/browse/HDFS-15839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15839.patch
>
>
> When call setBalancerBandwidth, throw exeption,
> {code:java}
> 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR 
> router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method 
> setBalancerBandwidth with types [class java.lang.Long] from 
> ClientProtocoljava.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long)
>  at java.lang.Class.getDeclaredMethod(Class.java:2130) at 
> org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15785) Datanode to support using DNS to resolve nameservices to IP addresses to get list of namenodes

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15785?focusedWorklogId=554520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554520
 ]

ASF GitHub Bot logged work on HDFS-15785:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 23:30
Start Date: 18/Feb/21 23:30
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #2639:
URL: https://github.com/apache/hadoop/pull/2639#issuecomment-781703710


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 33s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |   |   0m  0s | [test4tests](test4tests) |  The patch 
appears to include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 34s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  20m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  20m 37s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  17m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   3m 57s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   4m 12s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m  3s |  |  trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   4m  4s |  |  trunk passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +0 :ok: |  spotbugs  |  37m  5s |  |  Both FindBugs and SpotBugs are 
enabled, using SpotBugs.  |
   | +1 :green_heart: |  spotbugs  |   8m  6s |  |  trunk passed  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 53s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  19m 54s |  |  
root-jdkUbuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 generated 0 new + 1957 unchanged - 2 
fixed = 1957 total (was 1959)  |
   | +1 :green_heart: |  compile  |  17m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | +1 :green_heart: |  javac  |  17m 59s |  |  
root-jdkPrivateBuild-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 generated 0 new + 1852 unchanged - 
2 fixed = 1852 total (was 1854)  |
   | +1 :green_heart: |  checkstyle  |   3m 55s |  |  root: The patch generated 
0 new + 557 unchanged - 2 fixed = 557 total (was 559)  |
   | +1 :green_heart: |  mvnsite  |   4m  6s |  |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  |  The patch has no 
whitespace issues.  |
   | +1 :green_heart: |  xml  |   0m  1s |  |  The patch has no ill-formed XML 
file.  |
   | +1 :green_heart: |  shadedclient  |  13m 15s |  |  patch has no errors 
when building and testing our client artifacts.  |
   | +1 :green_heart: |  javadoc  |   3m  3s |  |  the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   4m  3s |  |  the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08  |
   | -1 :x: |  spotbugs  |   2m 18s | 
[/patch-spotbugs-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common cannot run computeBugHistory from 
spotbugs  |
   | -1 :x: |  spotbugs  |   2m 36s | 
[/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs-client.txt)
 |  hadoop-hdfs-project/hadoop-hdfs-client cannot run computeBugHistory from 
spotbugs  |
   | -1 :x: |  spotbugs  |   3m 14s | 
[/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2639/7/artifact/out/patch-spotbugs-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs cannot run computeBugHistory from spotbugs  
|
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  17m 24s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: 

[jira] [Work logged] (HDFS-15781) Add metrics for how blocks are moved in replaceBlock

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15781?focusedWorklogId=554517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554517
 ]

ASF GitHub Bot logged work on HDFS-15781:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 23:19
Start Date: 18/Feb/21 23:19
Worklog Time Spent: 10m 
  Work Description: Jing9 commented on a change in pull request #2704:
URL: https://github.com/apache/hadoop/pull/2704#discussion_r578815508



##
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/metrics/DataNodeMetrics.java
##
@@ -188,6 +188,15 @@
   @Metric MutableCounterLong packetsSlowWriteToDisk;
   @Metric MutableCounterLong packetsSlowWriteToOsCache;
 
+  @Metric("Number of replaceBlock ops between" +
+  " storage types on same host with local copy")
+  private MutableCounterLong replaceBlockOpOnSameHostWithCopy;
+  @Metric("Number of replaceBlock ops between" +
+  " storage types on same disk mount using hardlink")
+  private MutableCounterLong replaceBlockOpOnSameHostWithHardlink;

Review comment:
   Both "withHardlink" and "withCopy" are our block movement 
implementation. If in the future we change our implementation these names may 
no long hold. How about we change the metric names to "OnSameHost" and 
"OnSameMount" ? But then we need to think more about their semantic meanings. 
Maybe "OnSameHost" also includes the "OnSameMount"... Thoughts?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554517)
Time Spent: 40m  (was: 0.5h)

> Add metrics for how blocks are moved in replaceBlock
> 
>
> Key: HDFS-15781
> URL: https://issues.apache.org/jira/browse/HDFS-15781
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Leon Gao
>Assignee: Leon Gao
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We can add some metrics for  to track how the blocks are being moved, to get 
> a sense of the locality of movements.
>  * How many blocks copied to local host?
>  * How many blocks moved to local disk thru hardlink?
>  * How many blocks are copied out of the host
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554448
 ]

ASF GitHub Bot logged work on HDFS-15808:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 20:18
Start Date: 18/Feb/21 20:18
Worklog Time Spent: 10m 
  Work Description: shvachko edited a comment on pull request #2668:
URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781609671


   Just reposting my comment from the jira for visibility.
   
   The patch looks fine, but I doubt the metric will be useful in its current 
form. Monotonically increasing counter doesn't tell you much when plotted. Over 
time it just becomes an incredibly large number, hard to see its fluctuations. 
And you cannot set alerts if the threshold is exceeded often.
   See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful.
   I assume you need something like a rate.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554448)
Time Spent: 4h 20m  (was: 4h 10m)

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-15808) Add metrics for FSNamesystem read/write lock hold long time

2021-02-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15808?focusedWorklogId=554447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554447
 ]

ASF GitHub Bot logged work on HDFS-15808:
-

Author: ASF GitHub Bot
Created on: 18/Feb/21 20:18
Start Date: 18/Feb/21 20:18
Worklog Time Spent: 10m 
  Work Description: shvachko commented on pull request #2668:
URL: https://github.com/apache/hadoop/pull/2668#issuecomment-781609671


   Just reposting my comment on the jira for visibility.
   
   The patch looks fine, but I doubt the metric will be useful in its current 
form. Monotonically increasing counter doesn't tell you much when plotted. Over 
time it just becomes an incredibly large number, hard to see its fluctuations. 
And you cannot set alerts if the threshold is exceeded often.
   See e.g. ExpiredHeartbeats or LastWrittenTransactionId - not useful.
   I assume you need something like a rate.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 554447)
Time Spent: 4h 10m  (was: 4h)

> Add metrics for FSNamesystem read/write lock hold long time
> ---
>
> Key: HDFS-15808
> URL: https://issues.apache.org/jira/browse/HDFS-15808
> Project: Hadoop HDFS
>  Issue Type: Wish
>  Components: hdfs
>Reporter: tomscut
>Assignee: tomscut
>Priority: Major
>  Labels: hdfs, lock, metrics, pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> To monitor how often read/write locks exceed thresholds, we can add two 
> metrics(ReadLockWarning/WriteLockWarning), which are exposed in JMX.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15839) RBF: Cannot get method setBalancerBandwidth on Router Client

2021-02-18 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286568#comment-17286568
 ] 

Ayush Saxena commented on HDFS-15839:
-

Thanx [~hadoop_yangyun] for the patch, the prod change looks good,
But regarding the test, not sure what you want to do with the caller context 
there? I think that isn't required, just checking setBalancerBandwidth worked 
is fine.

> RBF: Cannot get method setBalancerBandwidth on Router Client
> 
>
> Key: HDFS-15839
> URL: https://issues.apache.org/jira/browse/HDFS-15839
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Major
> Attachments: HDFS-15839.patch
>
>
> When call setBalancerBandwidth, throw exeption,
> {code:java}
> 02-18 14:39:59,186 [IPC Server handler 0 on default port 43545] ERROR 
> router.RemoteMethod (RemoteMethod.java:getMethod(146)) - Cannot get method 
> setBalancerBandwidth with types [class java.lang.Long] from 
> ClientProtocoljava.lang.NoSuchMethodException: 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.setBalancerBandwidth(java.lang.Long)
>  at java.lang.Class.getDeclaredMethod(Class.java:2130) at 
> org.apache.hadoop.hdfs.server.federation.router.RemoteMethod.getMethod(RemoteMethod.java:140)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1312)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1250)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1221)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeConcurrent(RouterRpcClient.java:1194)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.setBalancerBandwidth(RouterClientProtocol.java:1188)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.setBalancerBandwidth(RouterRpcServer.java:1211)
>  at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setBalancerBandwidth(ClientNamenodeProtocolServerSideTranslatorPB.java:1254)
>  at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:537)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1037) at 
> org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:965) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2972){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org