[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447588#comment-16447588 ] Toshihiro Suzuki commented on HBASE-20006: -- Thank you for reviewing the patch [~busbey] [~stack]. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446771#comment-16446771 ] Hudson commented on HBASE-20006: Results for branch master [build #306 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/306/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/306//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/306//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/306//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446746#comment-16446746 ] Hudson commented on HBASE-20006: Results for branch branch-1 [build #290 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/290/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/290//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/290//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/290//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 source release artifact{color} -- See build output for details. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446464#comment-16446464 ] Hudson commented on HBASE-20006: Results for branch branch-2 [build #638 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/638/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/638//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/638//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/638//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446164#comment-16446164 ] stack commented on HBASE-20006: --- Pardon me [~brfrn169]. Took a look at patch. +1 > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 3.0.0, 2.1.0, 1.5.0 > > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426505#comment-16426505 ] Toshihiro Suzuki commented on HBASE-20006: -- The last build looks good. Could you please review? [~stack] > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426497#comment-16426497 ] Hadoop QA commented on HBASE-20006: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 3s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} hbase-server: The patch generated 0 new + 36 unchanged - 5 fixed = 36 total (was 41) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 51s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 20s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}102m 3s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20006 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917635/HBASE-20006.master.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 2eda5cef612b 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 5fed7fd3d2 | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/12305/testReport/ | | Max. process+thread count | 4274 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/12305/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426431#comment-16426431 ] Toshihiro Suzuki commented on HBASE-20006: -- I just reattached the v3 patch to rerun a build. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch, HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407936#comment-16407936 ] Toshihiro Suzuki commented on HBASE-20006: -- Ping [~stack]. Please check this. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16394200#comment-16394200 ] Toshihiro Suzuki commented on HBASE-20006: -- [~stack] Could you please review the latest patch? > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385549#comment-16385549 ] Hadoop QA commented on HBASE-20006: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 51s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 3s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 53s{color} | {color:green} hbase-server: The patch generated 0 new + 36 unchanged - 5 fixed = 36 total (was 41) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 5s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 8m 0s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 10m 6s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}112m 9s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20006 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912960/HBASE-20006.master.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 84f730fccf69 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 485af49e53 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | Test Results |
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385490#comment-16385490 ] Ted Yu commented on HBASE-20006: Ran TestRestoreSnapshotFromClientWithRegionReplicas locally with patch v3 - passed. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385461#comment-16385461 ] Toshihiro Suzuki commented on HBASE-20006: -- I forgot to remove the Ignore annotation in TestRestoreSnapshotFromClientWithRegionReplicas in the v2 patch. I attached the v3 patch where I removed the Ignore annotation. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch, > HBASE-20006.master.003.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385206#comment-16385206 ] Ted Yu commented on HBASE-20006: lgtm > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385174#comment-16385174 ] Hadoop QA commented on HBASE-20006: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 57s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 28s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} hbase-server: The patch generated 0 new + 36 unchanged - 5 fixed = 36 total (was 41) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 14s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}110m 10s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 32s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20006 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912923/HBASE-20006.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux ff6d210b6adf 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 485af49e53 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/11796/testReport/ | | Max. process+thread count | 4342 (vs. ulimit of 1) | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/11796/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org |
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385083#comment-16385083 ] Toshihiro Suzuki commented on HBASE-20006: -- I just attached the v2 patch. I changed hfile names in TestHRegionReplayEvents as hfile names have to match the regex (StoreFileInfo.HFILE_NAME_REGEX). > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch, HBASE-20006.master.002.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385015#comment-16385015 ] Hadoop QA commented on HBASE-20006: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 13s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} hbase-server: The patch generated 0 new + 20 unchanged - 5 fixed = 20 total (was 25) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 18s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 57s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 46s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}158m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegionReplayEvents | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-20006 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12912909/HBASE-20006.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 4d9e9afee0ee 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 485af49e53 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/11793/artifact/patchprocess/patch-unit-hbase-server.txt | | Test
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384979#comment-16384979 ] Toshihiro Suzuki commented on HBASE-20006: -- I attached the v1 patch. It seems like the problem occurs when taking a snapshot for a table some of whose regions has parent reference files, and when opening a replica region, a HFileLink References aren't handled correctly. I added the handle in the v1 patch. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Attachments: HBASE-20006.branch-2.001.patch, > HBASE-20006.master.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372970#comment-16372970 ] stack commented on HBASE-20006: --- Be my guest sir! > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Assignee: Toshihiro Suzuki >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372556#comment-16372556 ] Toshihiro Suzuki commented on HBASE-20006: -- Hi [~stack], can I try this Jira? > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367975#comment-16367975 ] stack commented on HBASE-20006: --- Marking critical (for read replica feature at least). > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Bug > Components: read replicas >Reporter: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367958#comment-16367958 ] stack commented on HBASE-20006: --- Ok, this is some read replica mess. I don't want to work on this figuring out this filenaming done for read replicas. Will let it to a read replicas person -- if any around. And I don't want this messing up our test runs. So for now disabling this test. Other exceptions seen are: {code} java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /user/jenkins/test-data/463e63dc-23bb-44ff-a32c-033c390552a6/data/default/testRestoreSnapshotAfterSplittingRegions-1518810548820/1c8eb80ac0831f0f27074b953eb647bb/cf/testRestoreSnapshotAfterSplittingRegions-1518810548820=1c8eb80ac0831f0f27074b953eb647bb-bfe5320da17b47e4b1553a14bacbc532 at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1836) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1808) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1723) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:588) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:366) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2213) at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1040) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:903) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:871) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7017) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6974) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6945) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6901) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6852) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:109) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} .. then this makes for failed assigns. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => >
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367611#comment-16367611 ] stack commented on HBASE-20006: --- With patch in place, we make more progress. We do the below output: 2018-02-16 14:35:01,027 INFO [PEWorker-15] procedure.MasterProcedureScheduler(571): pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure table=testOnlineSnapshotAfterSplittingRegions-1518791689780, parent=034c0b19e0cdb4c5788c2d4172fd16d9, daughterA=b5355f606c3f6dae55367b082065b41c, daughterB=094cf44c1d0b3a294f42d2017fd99907, table=testOnlineSnapshotAfterSplittingRegions-1518791689780, testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9. 2018-02-16 14:35:01,027 INFO [PEWorker-15] assignment.SplitTableRegionProcedure(439): Split of {ENCODED => 034c0b19e0cdb4c5788c2d4172fd16d9, NAME => 'testOnlineSnapshotAfterSplittingRegions-1518791689780,,1518791689824.034c0b19e0cdb4c5788c2d4172fd16d9.', STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT ... but rather than failing we then move on to... 2018-02-16 14:35:01,031 INFO [PEWorker-15] procedure2.ProcedureExecutor(1249): Finished pid=105, state=SUCCESS; SplitTableRegionProcedure table=testOnlineSnapshotAfterSplittingRegions-1518791689780, parent=034c0b19e0cdb4c5788c2d4172fd16d9, daughterA=b5355f606c3f6dae55367b082065b41c, daughterB=094cf44c1d0b3a294f42d2017fd99907 in 1.0180sec ... which is good in this case at least. Now I'm on to a new failure type Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile Trailer from file hdfs://localhost:55231/user/jenkins/test-data/fe7360bf-946e-44d4-8682-120eae0b7055/data/default/testOnlineSnapshotAfterSplittingRegions-1518791702651/1dda732469ff033fa21cc271586a80b5/cf/testOnlineSnapshotAfterSplittingRegions-1518791689780=034c0b19e0cdb4c5788c2d4172fd16d9-395104433d8d43e7b6710b6ec44d5b85.3cc16fba4ef7fb478d3eb1626a24a661 at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:545) at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:579) at org.apache.hadoop.hbase.regionserver.StoreFileReader.(StoreFileReader.java:104) at org.apache.hadoop.hbase.io.HalfStoreFileReader.(HalfStoreFileReader.java:108) at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:267) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:352) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:460) at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileAndReader(HStore.java:668) at org.apache.hadoop.hbase.regionserver.HStore.lambda$openStoreFiles$0(HStore.java:535) ... 6 more Caused by: java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:244) at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:401) at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:532) ... 14 more The file name is crazy. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE;
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366739#comment-16366739 ] Hudson commented on HBASE-20006: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4593 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4593/]) HBASE-20006 TestRestoreSnapshotFromClientWithRegionReplicas is flakey (stack: rev 40f8d20cf7b297a9324319190d03d93563230d6e) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreUtils.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/SplitTableRegionProcedure.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/trace/TestHTraceHooks.java > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366610#comment-16366610 ] stack commented on HBASE-20006: --- I pushed patch to master and branch-2. Lets see how it does. Also disabled TestHTrace while in here. We don't support htrace in 2.0.0, not yet anyways, and the test is flakey. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-20006.branch-2.001.patch > > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20006) TestRestoreSnapshotFromClientWithRegionReplicas is flakey
[ https://issues.apache.org/jira/browse/HBASE-20006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366606#comment-16366606 ] stack commented on HBASE-20006: --- So, what we have here is a split table followed by a split region. The split region encounters a parent that has been successfully split but that has not yet been GC'd. This is legit case. The assert is not right. Removing. > TestRestoreSnapshotFromClientWithRegionReplicas is flakey > - > > Key: HBASE-20006 > URL: https://issues.apache.org/jira/browse/HBASE-20006 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Priority: Major > > Failing 10% of the time. Interestingly, it is below that causes fail. We go > to split but it is already split. We will then fail the split with an > internal assert which messes up procedures; at a minimum we should just not > split (this is in the prepare stage). > {code} > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > procedure.MasterProcedureScheduler(571): pid=105, > state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd, > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d. > 2018-02-15 23:21:42,162 INFO [PEWorker-12] > assignment.SplitTableRegionProcedure(440): Split of {ENCODED => > 3f850cea7d71a7ebd019f2f009efca4d, NAME => > 'testOnlineSnapshotAfterSplittingRegions-1518736887838,,1518736887882.3f850cea7d71a7ebd019f2f009efca4d.', > STARTKEY => '', ENDKEY => '1'} skipped; state is already SPLIT > 2018-02-15 23:21:42,163 ERROR [PEWorker-12] > procedure2.ProcedureExecutor(1480): CODE-BUG: Uncaught runtime exception: > pid=105, state=RUNNABLE:SPLIT_TABLE_REGION_PREPARE; SplitTableRegionProcedure > table=testOnlineSnapshotAfterSplittingRegions-1518736887838, > parent=3f850cea7d71a7ebd019f2f009efca4d, > daughterA=06b5e6366efbef155d70e56cfdf58dc9, > daughterB=8c175de1b33765a5683ac1e502edb0bd > java.lang.AssertionError: split region should have an exception here > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:228) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.executeFromState(SplitTableRegionProcedure.java:89) > at > org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:180) > at > org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1455) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1224) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78) > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)