[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158133#comment-15158133 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-1.1-JDK8 #1755 (See [https://builds.apache.org/job/HBase-1.1-JDK8/1755/]) HBASE-15169 Backport HBASE-14362 'TestWALProcedureStoreOnHDFS is super (chenheng: rev 10d607717257af8e83368e34bc32f98f98389dc8) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157992#comment-15157992 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-1.1-JDK7 #1668 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1668/]) HBASE-15169 Backport HBASE-14362 'TestWALProcedureStoreOnHDFS is super (chenheng: rev 10d607717257af8e83368e34bc32f98f98389dc8) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934843#comment-14934843 ] Heng Chen commented on HBASE-14362: --- {quote} using waitForNumReplicas() invalidate the test. The point of the test is verify what happens when there are not enough replicas. {quote} yeah I made a stupid mistake... There is another way to fix this, we can catch the exception thrown out by {{syncSlots(stream, slots, 0, slotIndex)}} I update the patch, Any concerns? [~mbertozzi] > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935217#comment-14935217 ] Matteo Bertozzi commented on HBASE-14362: - in that way we just loose data. The exception in syncSlots(Stream, slots, offset, count) is already catched in syncSlots(). if we are not able to write/sync, we are trying to close the current wal and reopen a new one. we try N times, if we can't we give up because maybe that machine is no longer able to talk with HDFS or something like that. and we let the backup master try. for the test we just need to increase the number of retries, to be able to the slow test machines > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935015#comment-14935015 ] Hadoop QA commented on HBASE-14362: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764208/HBASE-14362.patch against master branch at commit 2ea70c7e6c70c4bd689b79718999a948001f3b21. ATTACHMENT ID: 12764208 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportExport org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS org.apache.hadoop.hbase.util.TestProcessBasedCluster org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures {color:red}-1 core zombie tests{color}. There are 4 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:236) at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1.testScanEmptyToBBB(TestTableInputFormatScan1.java:84) at org.apache.hadoop.hbase.mapreduce.TestCellCounter.testCellCounter(TestCellCounter.java:99) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2.testExcludeMinorCompaction(TestHFileOutputFormat2.java:1103) at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testWritingPEData(TestHFileOutputFormat.java:334) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15800//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15800//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15800//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15800//console This message is automatically generated. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935321#comment-14935321 ] Heng Chen commented on HBASE-14362: --- {quote} the only thing we can do if we want to verify the retry code is just bump the retries value. {quote} So your suggestion is to make the issue to be invalid or bump the retries value? > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935302#comment-14935302 ] Heng Chen commented on HBASE-14362: --- {quote} in that way we just loose data {quote} yeah, but we can use one class extends {{WALProcedureStore}} and override {{syncSlots(Stream, slots, offset, count)}} in this class. How about this? > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935310#comment-14935310 ] Matteo Bertozzi commented on HBASE-14362: - the code and the test are correct as they are. the test is flaky by nature, as the comment point out, since is bounded to the time it takes to restart the DN. the only thing we can do if we want to verify the retry code is just bump the retries value. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935328#comment-14935328 ] Matteo Bertozzi commented on HBASE-14362: - yeah, bump the retry value. we want to verify this situation and we know that the test will pass at some point, we just don't know how long the DN takes to get back online > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935956#comment-14935956 ] stack commented on HBASE-14362: --- Lets try it ([~mbertozzi] is good w/ it... too... ) > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936127#comment-14936127 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-1.1 #684 (See [https://builds.apache.org/job/HBase-1.1/684/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev 591e52b9657e2039085187fe0c8eb613d475be8b) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936193#comment-14936193 ] Hudson commented on HBASE-14362: SUCCESS: Integrated in HBase-1.2-IT #175 (See [https://builds.apache.org/job/HBase-1.2-IT/175/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev a2db17b796e56f1e208c7a4c1ab83a4dcccbe05e) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936264#comment-14936264 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-1.3 #215 (See [https://builds.apache.org/job/HBase-1.3/215/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev 255fd07d357d37c469b20db4d8b7ab52400b0ba8) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936311#comment-14936311 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-TRUNK #6855 (See [https://builds.apache.org/job/HBase-TRUNK/6855/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev 4cb3e029b0fe8dabd972fb39aa24f1ff0ad69b4c) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936212#comment-14936212 ] Hudson commented on HBASE-14362: SUCCESS: Integrated in HBase-1.3-IT #191 (See [https://builds.apache.org/job/HBase-1.3-IT/191/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev 255fd07d357d37c469b20db4d8b7ab52400b0ba8) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936243#comment-14936243 ] Hudson commented on HBASE-14362: FAILURE: Integrated in HBase-1.2 #209 (See [https://builds.apache.org/job/HBase-1.2/209/]) HBASE-14362 org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky (Heng Chen) (stack: rev a2db17b796e56f1e208c7a4c1ab83a4dcccbe05e) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Assignee: Heng Chen >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 > > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935686#comment-14935686 ] Hadoop QA commented on HBASE-14362: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764262/HBASE-14362_v1.patch against master branch at commit 37877e3f56b038c0821138862813e567390a9ff4. ATTACHMENT ID: 12764262 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.camel.component.jetty.jettyproducer.HttpJettyProducerRecipientListCustomThreadPoolTest.testRecipientList(HttpJettyProducerRecipientListCustomThreadPoolTest.java:40) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15808//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15808//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15808//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15808//console This message is automatically generated. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch, HBASE-14362.patch, > HBASE-14362_v1.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934596#comment-14934596 ] Hadoop QA commented on HBASE-14362: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12764151/HBASE-14362.patch against master branch at commit f6be2f9bf357d25e2b04afd20170ad20662b834f. ATTACHMENT ID: 12764151 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15792//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15792//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15792//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15792//console This message is automatically generated. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934427#comment-14934427 ] Heng Chen commented on HBASE-14362: --- If i set param {code} conf.setInt("hbase.procedure.store.wal.max.roll.retries", 1); conf.setInt("hbase.procedure.store.wal.sync.failure.roll.max", 1); {code} This issue will be reproduced locally. I think the reason is, as original logic {code} store.insert(new TestProcedure(i, -1), null); waitForNumReplicas(3); {code} after we restart dn, we insert immediately, when dn is not started fully, the testcase will failed. So we use {{waitForNumReplicas(3)}} before insert, in {{waitForNumReplicas(3)}} it will wait dn start fully. after do that, the testcase will not failed again locally. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934548#comment-14934548 ] Matteo Bertozzi commented on HBASE-14362: - using waitForNumReplicas() invalidate the test. The point of the test is verify what happens when there are not enough replicas. > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > Attachments: HBASE-14362.patch > > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745657#comment-14745657 ] Heng Chen commented on HBASE-14362: --- {quote} But most of this exception is catched in WALProcedureStore, {quote} this description has some misunderstanding. It should be "But most of this exception is catched in {{WALProcedureStore.syncLoop}}," > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745664#comment-14745664 ] Matteo Bertozzi commented on HBASE-14362: - if you look at the TestWALProcedureStoreOnHDFS code it already say that is flaky in case of slow machines. you can bump the value as much as you want, but if the machine is even slower it will fail again. the other downside is that it will slow down runs on faster machines {code} // increase the value for slow test-env conf.setInt("hbase.procedure.store.wal.wait.before.roll", 1000); conf.setInt("hbase.procedure.store.wal.max.roll.retries", 5); conf.setInt("hbase.procedure.store.wal.sync.failure.roll.max", 5); {code} > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745644#comment-14745644 ] Heng Chen commented on HBASE-14362: --- After analysis the log, i found something Testcase failed is due to the exception {code} org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /test-logs/state-0018.log could only be replicated to 2 nodes instead of minReplication (=3). There are 3 datanode(s) running and 3 node(s) are excluded in this operation. {code} There are lots of this kind exceptions in log, and it appears from the beginning of the log. But most of this exception is catched in {{WALProcedureStore}}, except the last one which was thrown by method {{syncSlots}} when logRolled times larger than {{maxSyncFailureRoll}} {code} private long syncSlots() throws Throwable { int retry = 0; int logRolled = 0; long totalSynced = 0; do { try { totalSynced = syncSlots(stream, slots, 0, slotIndex); break; } catch (Throwable e) { if (++retry >= maxRetriesBeforeRoll) { if (logRolled >= maxSyncFailureRoll) { LOG.error("Sync slots after log roll failed, abort.", e); sendAbortProcessSignal(); throw e; // here, the exception is throw out, and cause the syncLoop exit!! } if (!rollWriterOrDie()) { throw e; } logRolled++; retry = 0; } } } while (isRunning()); return totalSynced; } {code} So if i set {{hbase.procedure.store.wal.wait.before.roll}} and {{hbase.procedure.store.wal.sync.failure.roll.max}} to be a smaller number, the testcase will always run failed. So to fix this issue, we could increase the number when test-env is slow. Or we catch the exception. . > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky
[ https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745575#comment-14745575 ] Heng Chen commented on HBASE-14362: --- The issue reproduced just now! https://builds.apache.org/job/PreCommit-HBASE-Build/15596//testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/testWalRollOnLowReplication/ > org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super > duper flaky > - > > Key: HBASE-14362 > URL: https://issues.apache.org/jira/browse/HBASE-14362 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Dima Spivak >Priority: Critical > > [As seen in > Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/], > this test has been super flaky and we should probably address it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)