[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450833#comment-16450833 ] Hudson commented on HDFS-10816: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14057 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14057/]) HDFS-10816. TestComputeInvalidateWork#testDatanodeReRegistration fails (xyao: rev 14f782b6b960a818e0927edc7e32eb1fa51a2d08) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger >Priority: Major > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-10816-branch-2.002.patch, HDFS-10816.001.patch, > HDFS-10816.002.patch, HDFS-10816.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037595#comment-16037595 ] Hudson commented on HDFS-10816: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11825 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11825/]) HDFS-10816. TestComputeInvalidateWork#testDatanodeReRegistration fails (kihwal: rev e4e203e0807fafc5dd765344d008e42bd51cc979) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestComputeInvalidateWork.java > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2 > > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037486#comment-16037486 ] Kihwal Lee commented on HDFS-10816: --- +1 > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037439#comment-16037439 ] Eric Badger commented on HDFS-10816: Precommit test failures look unrelated > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037095#comment-16037095 ] Hadoop QA commented on HDFS-10816: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 33s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 87m 49s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}114m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-10816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12871231/HDFS-10816.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 7e07d98ed317 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 46f7e91 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/19774/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/19774/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/19774/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816.002.patch, HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16037082#comment-16037082 ] Hadoop QA commented on HDFS-10816: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 50m 11s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_131. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_131 Timed out junit tests | org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | JDK v1.7.0_131 Failed junit tests | hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.hdfs.web.TestWebHDFS | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:5e40efe | | JIRA Issue | HDFS-10816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12870804/HDFS-10816-branch-2.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux dc98d1d75698 3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | branch-2 / 2e8557d | | Default Java | 1.7.0_131 | | Multi-JDK versions |
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16034741#comment-16034741 ] Eric Badger commented on HDFS-10816: Not sure why hadoopqa isn't running on the latest patches. [~kihwal], can you kick the hadoopqa bot? > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch, HDFS-10816.002.patch, > HDFS-10816-branch-2.002.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033058#comment-16033058 ] Kihwal Lee commented on HDFS-10816: --- The patch needs to be revised. {{BlockManagerTestUtil}} no longer has {{stopReplicationThread()}} method. > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16033048#comment-16033048 ] Kihwal Lee commented on HDFS-10816: --- +1 > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450042#comment-15450042 ] Rushabh S Shah commented on HDFS-10816: --- Forgot to mention +1 (non-binding) > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450035#comment-15450035 ] Rushabh S Shah commented on HDFS-10816: --- [~ebadger]: Thanks for reporting and analyzing the failure. This test broke in our internal build recently. Below are the relevant logs: {noformat} 2016-08-29 01:54:49,332 INFO impl.RamDiskAsyncLazyPersistService (RamDiskAsyncLazyPersistService.java:shutdown(169)) - All async lazy persist service threads have been shut down 2016-08-29 01:54:49,336 INFO datanode.DataNode (DataNode.java:shutdown(1791)) - Shutdown complete. 2016-08-29 01:54:49,347 INFO BlockStateChange (BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 2016-08-29 01:54:49,349 INFO FSNamesystem.audit (FSNamesystem.java:logAuditMessage(8476)) - allowed=true ugi=tortuga (auth:SIMPLE) ip=/127.0.0.1 cmd=delete src=/testRR dst=null perm=null proto=rpc 2016-08-29 01:54:49,350 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 127.0.0.1:59637 to delete [blk_1073741825_1001] 2016-08-29 01:54:49,355 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1725)) - Shutting down the Mini HDFS Cluster {noformat} bq. 2016-08-29 01:54:49,336 INFO datanode.DataNode (DataNode.java:shutdown(1791)) - Shutdown complete. This line corresponds to shutting down the last datanode. bq. 2016-08-29 01:54:49,347 INFO BlockStateChange (BlockManager.java:addToInvalidates(1228)) - BLOCK* addToInvalidates: blk_1073741825_1001 127.0.0.1:57662 127.0.0.1:43137 127.0.0.1:59637 After stopping the last datanode, I can see the InvalidateBlocks size is 3. bq. 2016-08-29 01:54:49,350 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3582)) - BLOCK* BlockManager: ask 127.0.0.1:59637 to delete \[blk_1073741825_1001\] Then the replication monitor woke up and removed one block from the invalidateBlocks set I think the test was checking the invalidateBlock size just after the replication monitor computed invalidate work for one node and that failed. I think stopping the replication monitor is the correct fix. [~jojochuang], [~zhz]: Since you reviewed HDFS-9580, can you please help reviewing this patch. > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449672#comment-15449672 ] Eric Badger commented on HDFS-10816: The test failure is unrelated to the patch > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger >Assignee: Eric Badger > Attachments: HDFS-10816.001.patch > > > {noformat} > java.lang.AssertionError: Expected invalidate blocks to be the number of DNs > expected:<3> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160) > {noformat} > The test fails because of a race condition between the test and the > replication monitor. The default replication monitor interval is 3 seconds, > which is just about how long the test normally takes to run. The test deletes > a file and then subsequently gets the namesystem writelock. However, if the > replication monitor fires in between those two instructions, the test will > fail as it will itself invalidate one of the blocks. This can be easily > reproduced by removing the sleep() in the ReplicationMonitor's run() method > in BlockManager.java, so that the replication monitor executes as quickly as > possible and exacerbates the race. > To fix the test all that needs to be done is to turn off the replication > monitor. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10816) TestComputeInvalidateWork#testDatanodeReRegistration fails due to race between test and replication monitor
[ https://issues.apache.org/jira/browse/HDFS-10816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449652#comment-15449652 ] Hadoop QA commented on HDFS-10816: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 11s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 76m 21s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 99m 3s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestPersistBlocks | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Issue | HDFS-10816 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12826196/HDFS-10816.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 0538865c37d7 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / af50860 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/16577/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/16577/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/16577/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > TestComputeInvalidateWork#testDatanodeReRegistration fails due to race > between test and replication monitor > --- > > Key: HDFS-10816 > URL: https://issues.apache.org/jira/browse/HDFS-10816 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Eric Badger