[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2018-04-24 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450833#comment-16450833
 ] 

Hudson commented on HDFS-10816:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/14057/])
HDFS-10816. TestComputeInvalidateWork#testDatanodeReRegistration fails (xyao: 
rev 14f782b6b960a818e0927edc7e32eb1fa51a2d08)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java


> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
>Priority: Major
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-10816-branch-2.002.patch, HDFS-10816.001.patch, 
> HDFS-10816.002.patch, HDFS-10816.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037595#comment-16037595
 ] 

Hudson commented on HDFS-10816:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11825 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11825/])
HDFS-10816. TestComputeInvalidateWork#testDatanodeReRegistration fails (kihwal: 
rev e4e203e0807fafc5dd765344d008e42bd51cc979)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java


> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037486#comment-16037486
 ] 

Kihwal Lee commented on HDFS-10816:
---

+1

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037439#comment-16037439
 ] 

Eric Badger commented on HDFS-10816:


Precommit test failures look unrelated

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037095#comment-16037095
 ] 

Hadoop QA commented on HDFS-10816:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
33s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 49s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}114m 22s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | HDFS-10816 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12871231/HDFS-10816.002.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 7e07d98ed317 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 46f7e91 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19774/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19774/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19774/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the 

[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037082#comment-16037082
 ] 

Hadoop QA commented on HDFS-10816:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
38s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
59s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
8s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 11s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_131. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}126m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Timed out junit tests | 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics |
|   | hadoop.hdfs.server.balancer.TestBalancerRPCDelay |
|   | hadoop.hdfs.web.TestWebHDFS |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
|
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:5e40efe |
| JIRA Issue | HDFS-10816 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12870804/HDFS-10816-branch-2.002.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dc98d1d75698 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 
15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / 2e8557d |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |  

[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-02 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034741#comment-16034741
 ] 

Eric Badger commented on HDFS-10816:


Not sure why hadoopqa isn't running on the latest patches. [~kihwal], can you 
kick the hadoopqa bot?

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, 
> HDFS-10816-branch-2.002.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033058#comment-16033058
 ] 

Kihwal Lee commented on HDFS-10816:
---

The patch needs to be revised. {{BlockManagerTestUtil}} no longer has 
{{stopReplicationThread()}} method.

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2017-06-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033048#comment-16033048
 ] 

Kihwal Lee commented on HDFS-10816:
---

+1

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450042#comment-15450042
 ] 

Rushabh S Shah commented on HDFS-10816:
---

Forgot to mention +1 (non-binding)

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450035#comment-15450035
 ] 

Rushabh S Shah commented on HDFS-10816:
---

[~ebadger]: Thanks for reporting and analyzing the failure.
This test broke in our internal build recently.
Below are the relevant logs:
{noformat}
2016-08-29 01:54:49,332 INFO  impl.RamDiskAsyncLazyPersistService 
(RamDiskAsyncLazyPersistService.java:shutdown(169)) - All async lazy persist 
service threads have been shut down
2016-08-29 01:54:49,336 INFO  datanode.DataNode (DataNode.java:shutdown(1791)) 
- Shutdown complete.
2016-08-29 01:54:49,347 INFO  BlockStateChange 
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: 
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 
2016-08-29 01:54:49,349 INFO  FSNamesystem.audit 
(FSNamesystem.java:logAuditMessage(8476)) - allowed=true   ugi=tortuga 
(auth:SIMPLE)   ip=/127.0.0.1   cmd=delete  src=/testRR dst=null
perm=null   proto=rpc
2016-08-29 01:54:49,350 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 
127.0.0.1:59637 to delete [blk_1073741825_1001]
2016-08-29 01:54:49,355 INFO  hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1725)) - Shutting down the Mini HDFS Cluster
{noformat}

bq. 2016-08-29 01:54:49,336 INFO  datanode.DataNode 
(DataNode.java:shutdown(1791)) - Shutdown complete.
This line corresponds to shutting down the last datanode.
bq. 2016-08-29 01:54:49,347 INFO  BlockStateChange 
(BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: 
blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 
After stopping the last datanode, I can see the InvalidateBlocks size is 3.
bq. 2016-08-29 01:54:49,350 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 
127.0.0.1:59637 to delete \[blk_1073741825_1001\]
Then the replication monitor woke up and removed one block from the 
invalidateBlocks set 

I think the test was checking the invalidateBlock size just after the 
replication monitor computed invalidate work for one node and that failed.
I think stopping the replication monitor is the correct fix.

[~jojochuang], [~zhz]: Since you reviewed HDFS-9580, can you please help 
reviewing this patch.

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Eric Badger (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449672#comment-15449672
 ] 

Eric Badger commented on HDFS-10816:


The test failure is unrelated to the patch

> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger
>Assignee: Eric Badger
> Attachments: HDFS-10816.001.patch
>
>
> {noformat}
> java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
> expected:<3> but was:<2>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
> {noformat}
> The test fails because of a race condition between the test and the 
> replication monitor. The default replication monitor interval is 3 seconds, 
> which is just about how long the test normally takes to run. The test deletes 
> a file and then subsequently gets the namesystem writelock. However, if the 
> replication monitor fires in between those two instructions, the test will 
> fail as it will itself invalidate one of the blocks. This can be easily 
> reproduced by removing the sleep() in the ReplicationMonitor's run() method 
> in BlockManager.java, so that the replication monitor executes as quickly as 
> possible and exacerbates the race. 
> To fix the test all that needs to be done is to turn off the replication 
> monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor

2016-08-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449652#comment-15449652
 ] 

Hadoop QA commented on HDFS-10816:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
11s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 21s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 99m  3s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestPersistBlocks |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Issue | HDFS-10816 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12826196/HDFS-10816.001.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0538865c37d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / af50860 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/16577/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> TestComputeInvalidateWork#testDatanodeReRegistration fails due to race 
> between test and replication monitor
> ---
>
> Key: HDFS-10816
> URL: https://issues.apache.org/jira/browse/HDFS-10816
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Eric Badger