[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2016-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158133#comment-15158133
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-1.1-JDK8 #1755 (See 
[https://builds.apache.org/job/HBase-1.1-JDK8/1755/])
HBASE-15169 Backport HBASE-14362 'TestWALProcedureStoreOnHDFS is super 
(chenheng: rev 10d607717257af8e83368e34bc32f98f98389dc8)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2016-02-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157992#comment-15157992
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-1.1-JDK7 #1668 (See 
[https://builds.apache.org/job/HBase-1.1-JDK7/1668/])
HBASE-15169 Backport HBASE-14362 'TestWALProcedureStoreOnHDFS is super 
(chenheng: rev 10d607717257af8e83368e34bc32f98f98389dc8)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934843#comment-14934843
 ] 

Heng Chen commented on HBASE-14362:
---

{quote}
using waitForNumReplicas() invalidate the test. 
The point of the test is verify what happens when there are not enough replicas.
{quote}

yeah  I made a stupid mistake...

There is another way to fix this,  we can catch the exception thrown out by 
{{syncSlots(stream, slots, 0, slotIndex)}}

I update the patch,  Any concerns?  [~mbertozzi]

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935217#comment-14935217
 ] 

Matteo Bertozzi commented on HBASE-14362:
-

in that way we just loose data. 
The exception in syncSlots(Stream, slots, offset, count) is already catched in 
syncSlots().
if we are not able to write/sync, we are trying to close the current wal and 
reopen a new one.
we try N times, if we can't we give up because maybe that machine is no longer 
able to talk with HDFS or something like that. and we let the backup master try.
for the test we just need to increase the number of retries, to be able to the 
slow test machines

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935015#comment-14935015
 ] 

Hadoop QA commented on HBASE-14362:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764208/HBASE-14362.patch
  against master branch at commit 2ea70c7e6c70c4bd689b79718999a948001f3b21.
  ATTACHMENT ID: 12764208

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.mapreduce.TestImportExport
  
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS
  org.apache.hadoop.hbase.util.TestProcessBasedCluster
  
org.apache.hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures

 {color:red}-1 core zombie tests{color}.  There are 4 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:236)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan1.testScanEmptyToBBB(TestTableInputFormatScan1.java:84)
at 
org.apache.hadoop.hbase.mapreduce.TestCellCounter.testCellCounter(TestCellCounter.java:99)
at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2.testExcludeMinorCompaction(TestHFileOutputFormat2.java:1103)
at 
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat.testWritingPEData(TestHFileOutputFormat.java:334)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15800//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15800//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15800//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15800//console

This message is automatically generated.

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935321#comment-14935321
 ] 

Heng Chen commented on HBASE-14362:
---

{quote}
the only thing we can do if we want to verify the retry code is just bump the 
retries value.
{quote}

So your suggestion is to make the issue to be invalid or bump the retries value?

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935302#comment-14935302
 ] 

Heng Chen commented on HBASE-14362:
---

{quote}
in that way we just loose data
{quote}

yeah, but we can use one class extends {{WALProcedureStore}} and override 
{{syncSlots(Stream, slots, offset, count)}} in this class.

How about this?

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935310#comment-14935310
 ] 

Matteo Bertozzi commented on HBASE-14362:
-

the code and the test are correct as they are. the test is flaky by nature, as 
the comment point out, since is bounded to the time it takes to restart the DN. 
the only thing we can do if we want to verify the retry code is just bump the 
retries value.

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935328#comment-14935328
 ] 

Matteo Bertozzi commented on HBASE-14362:
-

yeah, bump the retry value. we want to verify this situation and we know that 
the test will pass at some point, we just don't know how long the DN takes to 
get back online

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935956#comment-14935956
 ] 

stack commented on HBASE-14362:
---

Lets try it ([~mbertozzi] is good w/ it... too... )

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936127#comment-14936127
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-1.1 #684 (See 
[https://builds.apache.org/job/HBase-1.1/684/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev 591e52b9657e2039085187fe0c8eb613d475be8b)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936193#comment-14936193
 ] 

Hudson commented on HBASE-14362:


SUCCESS: Integrated in HBase-1.2-IT #175 (See 
[https://builds.apache.org/job/HBase-1.2-IT/175/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev a2db17b796e56f1e208c7a4c1ab83a4dcccbe05e)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936264#comment-14936264
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-1.3 #215 (See 
[https://builds.apache.org/job/HBase-1.3/215/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev 255fd07d357d37c469b20db4d8b7ab52400b0ba8)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936311#comment-14936311
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-TRUNK #6855 (See 
[https://builds.apache.org/job/HBase-TRUNK/6855/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev 4cb3e029b0fe8dabd972fb39aa24f1ff0ad69b4c)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936212#comment-14936212
 ] 

Hudson commented on HBASE-14362:


SUCCESS: Integrated in HBase-1.3-IT #191 (See 
[https://builds.apache.org/job/HBase-1.3-IT/191/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev 255fd07d357d37c469b20db4d8b7ab52400b0ba8)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936243#comment-14936243
 ] 

Hudson commented on HBASE-14362:


FAILURE: Integrated in HBase-1.2 #209 (See 
[https://builds.apache.org/job/HBase-1.2/209/])
HBASE-14362 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
duper flaky (Heng Chen) (stack: rev a2db17b796e56f1e208c7a4c1ab83a4dcccbe05e)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/procedure/TestWALProcedureStoreOnHDFS.java


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3
>
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935686#comment-14935686
 ] 

Hadoop QA commented on HBASE-14362:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764262/HBASE-14362_v1.patch
  against master branch at commit 37877e3f56b038c0821138862813e567390a9ff4.
  ATTACHMENT ID: 12764262

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.6.1 2.7.0 
2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.camel.component.jetty.jettyproducer.HttpJettyProducerRecipientListCustomThreadPoolTest.testRecipientList(HttpJettyProducerRecipientListCustomThreadPoolTest.java:40)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15808//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15808//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15808//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15808//console

This message is automatically generated.

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch, HBASE-14362.patch, 
> HBASE-14362_v1.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934596#comment-14934596
 ] 

Hadoop QA commented on HBASE-14362:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12764151/HBASE-14362.patch
  against master branch at commit f6be2f9bf357d25e2b04afd20170ad20662b834f.
  ATTACHMENT ID: 12764151

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testFlushCacheWhileScanning(TestHRegion.java:3756)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15792//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15792//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15792//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/15792//console

This message is automatically generated.

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-28 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934427#comment-14934427
 ] 

Heng Chen commented on HBASE-14362:
---

If i set param 
{code}
conf.setInt("hbase.procedure.store.wal.max.roll.retries", 1);
conf.setInt("hbase.procedure.store.wal.sync.failure.roll.max", 1);
{code}

This issue will be reproduced locally.  

I think the reason is, as original logic  
{code}
 store.insert(new TestProcedure(i, -1), null);
 waitForNumReplicas(3);
{code}

after we restart dn,  we insert immediately,  when dn is not started fully, the 
testcase will failed.

So we use {{waitForNumReplicas(3)}} before insert,  in 
{{waitForNumReplicas(3)}} it will wait dn start fully. 

after do that,  the testcase will not failed again locally.



> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-28 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934548#comment-14934548
 ] 

Matteo Bertozzi commented on HBASE-14362:
-

using waitForNumReplicas() invalidate the test. 
The point of the test is verify what happens when there are not enough replicas.

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
> Attachments: HBASE-14362.patch
>
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-15 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745657#comment-14745657
 ] 

Heng Chen commented on HBASE-14362:
---

{quote}
But most of this exception is catched in WALProcedureStore,
{quote}

this description has some misunderstanding.

It should be 

"But most of this exception is catched in {{WALProcedureStore.syncLoop}},"

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-15 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745664#comment-14745664
 ] 

Matteo Bertozzi commented on HBASE-14362:
-

if you look at the TestWALProcedureStoreOnHDFS code it already say that is 
flaky in case of slow machines.
you can bump the value as much as you want, but if the machine is even slower 
it will fail again. 
the other downside is that it will slow down runs on faster machines
{code}
// increase the value for slow test-env
conf.setInt("hbase.procedure.store.wal.wait.before.roll", 1000);
conf.setInt("hbase.procedure.store.wal.max.roll.retries", 5);
conf.setInt("hbase.procedure.store.wal.sync.failure.roll.max", 5);
{code}

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-15 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745644#comment-14745644
 ] 

Heng Chen commented on HBASE-14362:
---

After analysis the log, i found something 
Testcase failed is due to the exception 
{code}
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/test-logs/state-0018.log could only be replicated to 2 nodes 
instead of minReplication (=3).  There are 3 datanode(s) running and 3 node(s) 
are excluded in this operation.
{code}

There are lots of this kind exceptions in log, and it appears from the 
beginning of the log.

But most of this exception is catched in {{WALProcedureStore}}, except the last 
one which was thrown by method {{syncSlots}} when logRolled times larger than 
{{maxSyncFailureRoll}} 

{code}
  private long syncSlots() throws Throwable {
int retry = 0;
int logRolled = 0;
long totalSynced = 0;
do {
  try {
totalSynced = syncSlots(stream, slots, 0, slotIndex);
break;
  } catch (Throwable e) {
if (++retry >= maxRetriesBeforeRoll) {
  if (logRolled >= maxSyncFailureRoll) {
LOG.error("Sync slots after log roll failed, abort.", e);
sendAbortProcessSignal();
throw e;   // here, the exception is throw out,  and cause the 
syncLoop exit!!
  }

  if (!rollWriterOrDie()) {
throw e;
  }

  logRolled++;
  retry = 0;
}
  }
} while (isRunning());
return totalSynced;
  }
{code}

So if i set {{hbase.procedure.store.wal.wait.before.roll}} and 
{{hbase.procedure.store.wal.sync.failure.roll.max}} to be a smaller number,  
the testcase will always run failed.


So to fix this issue,  we could increase the number when test-env is slow. Or 
we catch the exception.

.


> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14362) org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super duper flaky

2015-09-15 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14745575#comment-14745575
 ] 

Heng Chen commented on HBASE-14362:
---

The issue reproduced just now!  

https://builds.apache.org/job/PreCommit-HBASE-Build/15596//testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/testWalRollOnLowReplication/

> org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS is super 
> duper flaky
> -
>
> Key: HBASE-14362
> URL: https://issues.apache.org/jira/browse/HBASE-14362
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Dima Spivak
>Priority: Critical
>
> [As seen in 
> Jenkins|https://builds.apache.org/job/HBase-TRUNK/lastCompletedBuild/testReport/org.apache.hadoop.hbase.master.procedure/TestWALProcedureStoreOnHDFS/history/],
>  this test has been super flaky and we should probably address it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)