[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-03-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16391862#comment-16391862
 ] 

stack commented on HBASE-19965:
---

Two addendums.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: 19965.addendum1.txt, 19965.addendum2.txt, 
> HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365120#comment-16365120
 ] 

stack commented on HBASE-19965:
---

This fell off the flakies list, both TestAsyncRegion* and TestAsyncTable*.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363655#comment-16363655
 ] 

Hudson commented on HBASE-19965:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4582 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4582/])
HBASE-19965 Fix flaky TestAsyncRegionAdminApi; ADDENDUM do same to (stack: rev 
6e35f5eab9799e5d1450f97dec9335800dd39c03)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi3.java


> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363494#comment-16363494
 ] 

stack commented on HBASE-19965:
---

Pushed second addendum that breaks a TestAsyncTableAdminAPI3 out of 
TestAsyncTableAdminAPI.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363464#comment-16363464
 ] 

stack commented on HBASE-19965:
---

Here is for the test that timed out, build 314:

---
Test set: org.apache.hadoop.hbase.client.TestAsyncTableAdminApi
---
Tests run: 30, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 574.271 s <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestAsyncTableAdminApi
org.apache.hadoop.hbase.client.TestAsyncTableAdminApi  Time elapsed: 8.443 s  
<<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds

org.apache.hadoop.hbase.client.TestAsyncTableAdminApi  Time elapsed: 8.473 s  
<<< ERROR!
java.lang.Exception: Appears to be stuck in thread DataXceiver for client 
DFSClient_NONMAPREDUCE_1381247601_23 at /127.0.0.1:40966 [Receiving block 
BP-1735548202-172.17.0.2-1518565636532:blk_1073741829_1005]

Parameterized there are about 29 tests. None takes a particularly long time:

{code}
 1 2018-02-13 23:47:40,557 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testCreateTableWithEmptyRowInTheSplitKeys[0] 
Thread=302, OpenFileDescriptor=1612, MaxFileDescriptor=1048576, 
SystemLoadAverage=2007, ProcessCount=17, AvailableMemoryMB=21888
  2 2018-02-13 23:47:40,659 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testDeleteTable[0] Thread=305, 
OpenFileDescriptor=1614, MaxFileDescriptor=1048576, SystemLoadAverage=1982, 
ProcessCount=17, AvailableMemoryMB=21890
  3 2018-02-13 23:47:51,503 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testDisableAndEnableTables[0] Thread=320, 
OpenFileDescriptor=1616, MaxFileDescriptor=1048576, SystemLoadAverage=1827, 
ProcessCount=17, AvailableMemoryMB=21940
  4 2018-02-13 23:48:32,308 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testCreateTable[0] Thread=344, 
OpenFileDescriptor=1589, MaxFileDescriptor=1048576, SystemLoadAverage=1798, 
ProcessCount=17, AvailableMemoryMB=22132
  5 2018-02-13 23:48:41,898 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testCreateTableWithRegions[0] Thread=344, 
OpenFileDescriptor=1586, MaxFileDescriptor=1048576, SystemLoadAverage=1867, 
ProcessCount=17, AvailableMemoryMB=21816
  6 2018-02-13 23:49:23,348 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testIsTableEnabledAndDisabled[0] Thread=435, 
OpenFileDescriptor=1571, MaxFileDescriptor=1048576, SystemLoadAverage=1732, 
ProcessCount=17, AvailableMemoryMB=21124
  7 2018-02-13 23:49:36,012 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testListTables[0] Thread=441, 
OpenFileDescriptor=1577, MaxFileDescriptor=1048576, SystemLoadAverage=1809, 
ProcessCount=17, AvailableMemoryMB=20713
  8 2018-02-13 23:50:08,285 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testTruncateTablePreservingSplits[0] Thread=375, 
OpenFileDescriptor=1567, MaxFileDescriptor=1048576, SystemLoadAverage=1902, 
ProcessCount=17, AvailableMemoryMB=20604
  9 2018-02-13 23:50:26,969 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testCreateTableNumberOfRegions[0] Thread=373, 
OpenFileDescriptor=1590, MaxFileDescriptor=1048576, SystemLoadAverage=1889, 
ProcessCount=17, AvailableMemoryMB=20979
 10 2018-02-13 23:51:18,514 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testDisableAndEnableTable[0] Thread=439, 
OpenFileDescriptor=1569, MaxFileDescriptor=1048576, SystemLoadAverage=1876, 
ProcessCount=17, AvailableMemoryMB=20712
 11 2018-02-13 23:51:40,787 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testEnableTableRetainAssignment[0] Thread=430, 
OpenFileDescriptor=1575, MaxFileDescriptor=1048576, SystemLoadAverage=1754, 
ProcessCount=17, AvailableMemoryMB=20766
 12 2018-02-13 23:51:59,198 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testTruncateTable[0] Thread=473, 
OpenFileDescriptor=1578, MaxFileDescriptor=1048576, SystemLoadAverage=1826, 
ProcessCount=17, AvailableMemoryMB=21183
 13 2018-02-13 23:52:17,821 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTableAdminApi#testGetTableDescriptor[0] Thread=446, 
OpenFileDescriptor=1576, MaxFileDescriptor=1048576, SystemLoadAverage=1827, 
ProcessCount=17, AvailableMemoryMB=21732
 14 2018-02-13 23:52:27,280 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncTabl

[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362227#comment-16362227
 ] 

Hudson commented on HBASE-19965:


FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4577 (See 
[https://builds.apache.org/job/HBase-Trunk_matrix/4577/])
HBASE-19965 Fix flaky TestAsyncRegionAdminApi (stack: rev 
cf57ea15f1044d69fcfc8edee3c6f710ed78a7e0)
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi2.java
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncRegionAdminApi.java
HBASE-19965 Fix flaky TestAsyncRegionAdminApi; ADDENDUM do same to (stack: rev 
b4622ffad7011b3d29bce6376edb896de6df9f6e)
* (edit) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi.java
* (add) 
hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAsyncTableAdminApi2.java


> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361901#comment-16361901
 ] 

stack commented on HBASE-19965:
---

Pushed an addendum that gives TestAsyncTableAdminApi the same treatment, 
breaking it in two.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361886#comment-16361886
 ] 

stack commented on HBASE-19965:
---

---
Test set: org.apache.hadoop.hbase.client.TestAsyncTableAdminApi
---
Tests run: 34, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 572.494 s <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestAsyncTableAdminApi
org.apache.hadoop.hbase.client.TestAsyncTableAdminApi  Time elapsed: 19.573 s  
<<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds
  at 
org.apache.hadoop.hbase.client.TestAsyncTableAdminApi.testListTables(TestAsyncTableAdminApi.java:122)

org.apache.hadoop.hbase.client.TestAsyncTableAdminApi  Time elapsed: 19.604 s  
<<< ERROR!
java.lang.Exception: Appears to be stuck in thread DataXceiver for client 
DFSClient_NONMAPREDUCE_-1429371430_23 at /127.0.0.1:60612 [Receiving block 
BP-217569409-172.17.0.2-1518490564363:blk_1073741829_1005]

.. has the same pattern. Parameterized. No particular test taking a while:
{code}
2018-02-13 02:56:30,183 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: 
client.TestAsyncTableAdminApi#testCreateTableWithEmptyRowInTheSplitKeys[0] 
Thread=302, OpenFileDescriptor=1610, MaxFileDescriptor=1048576, 
SystemLoadAverage=1755, ProcessCount=17, AvailableMemoryMB=10328
2018-02-13 02:56:30,294 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testDeleteTable[0] Thread=304, 
OpenFileDescriptor=1612, MaxFileDescriptor=1048576, SystemLoadAverage=1755, 
ProcessCount=17, AvailableMemoryMB=10317
2018-02-13 02:56:41,992 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testDisableAndEnableTables[0] Thread=322, 
OpenFileDescriptor=1613, MaxFileDescriptor=1048576, SystemLoadAverage=1763, 
ProcessCount=17, AvailableMemoryMB=9131
2018-02-13 02:57:21,465 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testCreateTable[0] Thread=349, 
OpenFileDescriptor=1596, MaxFileDescriptor=1048576, SystemLoadAverage=1841, 
ProcessCount=17, AvailableMemoryMB=9816
2018-02-13 02:57:35,108 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testModifyColumnFamily[0] Thread=345, 
OpenFileDescriptor=1571, MaxFileDescriptor=1048576, SystemLoadAverage=1801, 
ProcessCount=17, AvailableMemoryMB=10954
2018-02-13 02:57:50,562 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testDisableCatalogTable[0] Thread=343, 
OpenFileDescriptor=1585, MaxFileDescriptor=1048576, SystemLoadAverage=1867, 
ProcessCount=17, AvailableMemoryMB=11009
2018-02-13 02:58:03,013 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testCreateTableWithRegions[0] Thread=341, 
OpenFileDescriptor=1574, MaxFileDescriptor=1048576, SystemLoadAverage=1876, 
ProcessCount=17, AvailableMemoryMB=9139
2018-02-13 02:58:41,708 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testIsTableEnabledAndDisabled[0] 
Thread=454, OpenFileDescriptor=1569, MaxFileDescriptor=1048576, 
SystemLoadAverage=1901, ProcessCount=20, AvailableMemoryMB=9379
2018-02-13 02:58:51,803 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testListTables[0] Thread=459, 
OpenFileDescriptor=1578, MaxFileDescriptor=1048576, SystemLoadAverage=1886, 
ProcessCount=17, AvailableMemoryMB=9836
2018-02-13 02:59:28,658 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testTruncateTablePreservingSplits[0] 
Thread=370, OpenFileDescriptor=1570, MaxFileDescriptor=1048576, 
SystemLoadAverage=1823, ProcessCount=17, AvailableMemoryMB=8980
2018-02-13 02:59:52,804 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testCompactionTimestamps[0] Thread=385, 
OpenFileDescriptor=1575, MaxFileDescriptor=1048576, SystemLoadAverage=1805, 
ProcessCount=17, AvailableMemoryMB=8672
2018-02-13 03:00:11,715 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testTableAvailableWithRandomSplitKeys[0] 
Thread=385, OpenFileDescriptor=1593, MaxFileDescriptor=1048576, 
SystemLoadAverage=1863, ProcessCount=17, AvailableMemoryMB=8835
2018-02-13 03:00:24,545 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#testAddSameColumnFamilyTwice[0] 
Thread=388, OpenFileDescriptor=1584, MaxFileDescriptor=1048576, 
SystemLoadAverage=2003, ProcessCount=17, AvailableMemoryMB=7541
2018-02-13 03:00:35,379 INFO  [Time-limited test] hbase.ResourceChecker(148): 
before: client.TestAsyncTableAdminApi#test

[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361881#comment-16361881
 ] 

stack commented on HBASE-19965:
---

Pushed to master and branch-2. Leaving open for now to see if this fixes it.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361879#comment-16361879
 ] 

stack commented on HBASE-19965:
---

.001 just breaks up the test into two pieces. Other reason test takes a while 
is that it is parameterized.

Let me push this to see how it does overnight.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19965.branch-2.001.patch
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16361854#comment-16361854
 ] 

stack commented on HBASE-19965:
---

It looks like the test can go on longer than our ten minute timeout. Here is 
complaint from last two failures in nightlies.

---
Test set: org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi
---
Tests run: 25, Failures: 0, Errors: 2, Skipped: 2, Time elapsed: 572.508 s <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi
org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi  Time elapsed: 14.642 s  
<<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds
  at 
org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:363)

org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi  Time elapsed: 14.642 s  
<<< ERROR!
java.lang.Exception: Appears to be stuck in thread Socket Reader #1 for port 
35917



---
Test set: org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi
---
Tests run: 24, Failures: 0, Errors: 2, Skipped: 2, Time elapsed: 574.354 s <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi
org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi  Time elapsed: 7.735 s  
<<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 600 seconds
  at 
org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testSplitSwitch(TestAsyncRegionAdminApi.java:285)

org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi  Time elapsed: 7.772 s  
<<< ERROR!
java.lang.Exception: Appears to be stuck in thread Socket Reader #1 for port 
43069





See how there are 20 odd tests run. See how last one was running for 14 seconds 
yet we timed out after 600 seconds. Here are start times for the latest run:

{code}
  1 2018-02-13 02:58:52,380 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testCompactRegionServer[0] Thread=302, 
OpenFileDescriptor=1612, MaxFileDescriptor=1048576, SystemLoadAverage=1886, 
ProcessCount=20, AvailableMemoryMB=9729
  2 2018-02-13 02:59:21,149 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testGetOnlineRegions[0] Thread=324, 
OpenFileDescriptor=1625, MaxFileDescriptor=1048576, SystemLoadAverage=1887, 
ProcessCount=17, AvailableMemoryMB=9766
  3 2018-02-13 02:59:32,669 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testCompactMob[0] Thread=367, 
OpenFileDescriptor=1616, MaxFileDescriptor=1048576, SystemLoadAverage=1814, 
ProcessCount=17, AvailableMemoryMB=8805
  4 2018-02-13 02:59:50,492 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testSplitTable[0] Thread=373, 
OpenFileDescriptor=1630, MaxFileDescriptor=1048576, SystemLoadAverage=1763, 
ProcessCount=17, AvailableMemoryMB=8683
  5 2018-02-13 03:01:18,069 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testAssignRegionAndUnassignRegion[0] Thread=380, 
OpenFileDescriptor=1581, MaxFileDescriptor=1048576, SystemLoadAverage=1761, 
ProcessCount=17, AvailableMemoryMB=9252
  6 2018-02-13 03:01:28,706 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testCompact[0] Thread=386, 
OpenFileDescriptor=1586, MaxFileDescriptor=1048576, SystemLoadAverage=1804, 
ProcessCount=17, AvailableMemoryMB=10421
  7 2018-02-13 03:02:36,046 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testGetRegionByStateOfTable[0] Thread=360, 
OpenFileDescriptor=1573, MaxFileDescriptor=1048576, SystemLoadAverage=1778, 
ProcessCount=17, AvailableMemoryMB=10053
  8 2018-02-13 03:02:51,851 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testFlushTableAndRegion[0] Thread=387, 
OpenFileDescriptor=1581, MaxFileDescriptor=1048576, SystemLoadAverage=1773, 
ProcessCount=20, AvailableMemoryMB=9379
  9 2018-02-13 03:03:09,421 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testSplitSwitch[0] Thread=405, 
OpenFileDescriptor=1590, MaxFileDescriptor=1048576, SystemLoadAverage=1809, 
ProcessCount=17, AvailableMemoryMB=8787
 10 2018-02-13 03:03:33,784 INFO  [Time-limited test] 
hbase.ResourceChecker(148): before: 
client.TestAsyncRegionAdminApi#testMergeRegions[0] Thread=413, 
OpenFileDescriptor=1595, MaxFileDescriptor=1048576, SystemLoadAverage=1795, 
ProcessCount=17, AvailableMemoryMB=9152
 11 2018-02-13 03:04:01

[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359234#comment-16359234
 ] 

stack commented on HBASE-19965:
---

Thank you [~zghaobac] Let me look tomorrow. I came across another instance 
today where on restart, if CLOSED, we presume an outstanding assign though the 
table is disabled This seems similar. Let me study it. Thanks for the 
research.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-09 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359226#comment-16359226
 ] 

Guanghao Zhang commented on HBASE-19965:


For the regions belong to disabled table or splitted region, we skip to assign 
it in AssignProcedure.startTransition method. Maybe we can use the same pattern 
for merged region. Thanks [~stack] for pick up this. :-)
{code:java}
// Don't assign if table is in disabling of disabled state.
TableStateManager tsm = env.getMasterServices().getTableStateManager();
TableName tn = regionNode.getRegionInfo().getTable();
if (tsm.isTableState(tn, TableState.State.DISABLING, 
TableState.State.DISABLED)) {
  LOG.info("Table " + tn + " state=" + tsm.getTableState(tn) + ", skipping 
" + this);
  return false;
}
// If the region is SPLIT, we can't assign it. But state might be CLOSED, 
rather than
// SPLIT which is what a region gets set to when Unassigned as part of 
SPLIT. FIX.
if (regionNode.isInState(State.SPLIT) ||
(regionNode.getRegionInfo().isOffline() && 
regionNode.getRegionInfo().isSplit())) {
  LOG.info("SPLIT, cannot be assigned; " + this + "; " + regionNode +
"; hri=" + regionNode.getRegionInfo());
  return false;
}
{code}


> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358602#comment-16358602
 ] 

stack commented on HBASE-19965:
---

Thank you for the research [~zghaobac]; let me have a go at it boss. Nice work 
sir.

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Assignee: stack
>Priority: Major
> Fix For: 2.0.0-beta-2
>
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-09 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358221#comment-16358221
 ] 

Guanghao Zhang commented on HBASE-19965:


The impl seems not same with the javadoc..
{code:java}
  public enum State {
OFFLINE,// region is in an offline state
OPENING,// server has begun to open but not yet done
OPEN,   // server opened region and updated meta
CLOSING,// server has begun to close but not yet done
CLOSED, // server closed region and updated meta
SPLITTING,  // server started split of a region
SPLIT,  // server completed split of a region
FAILED_OPEN,// failed to open, and won't retry any more
FAILED_CLOSE,   // failed to close, and won't retry any more
MERGING,// server started merge a region
MERGED, // server completed merge a region
SPLITTING_NEW,  // new region to be created when RS splits a parent
// region but hasn't be created yet, or master doesn't
// know it's already created
MERGING_NEW;// new region to be created when RS merges two
// daughter regions but hasn't be created yet, or
// master doesn't know it's already created
{code}

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-09 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358156#comment-16358156
 ] 

Guanghao Zhang commented on HBASE-19965:


When split a region, the parent region was marked as State.SPLIT. And the 
daughter regions were marked as State.SPLITTING_NEW. But for merge regions, 
only the new child region was marked State.MERGED. And don't change the parent 
regions state.
{code:java}
  public void markRegionAsSplit(final RegionInfo parent, final ServerName 
serverName,
  final RegionInfo daughterA, final RegionInfo daughterB)
  throws IOException {
// Update hbase:meta. Parent will be marked offline and split up in 
hbase:meta.
// The parent stays in regionStates until cleared when removed by 
CatalogJanitor.
// Update its state in regionStates to it shows as offline and split when 
read
// later figuring what regions are in a table and what are not: see
// regionStates#getRegionsOfTable
final RegionStateNode node = 
regionStates.getOrCreateRegionStateNode(parent);
node.setState(State.SPLIT);
final RegionStateNode nodeA = 
regionStates.getOrCreateRegionStateNode(daughterA);
nodeA.setState(State.SPLITTING_NEW);
final RegionStateNode nodeB = 
regionStates.getOrCreateRegionStateNode(daughterB);
nodeB.setState(State.SPLITTING_NEW);

regionStateStore.splitRegion(parent, daughterA, daughterB, serverName);
if (shouldAssignFavoredNodes(parent)) {
  List onlineServers = 
this.master.getServerManager().getOnlineServersList();
  ((FavoredNodesPromoter)getBalancer()).
  generateFavoredNodesForDaughter(onlineServers, parent, daughterA, 
daughterB);
}
  }

  /**
   * When called here, the merge has happened. The two merged regions have been
   * unassigned and the above markRegionClosed has been called on each so they 
have been
   * disassociated from a hosting Server. The merged region will be open after 
this call. The
   * merged regions are removed from hbase:meta below> Later they are deleted 
from the filesystem
   * by the catalog janitor running against hbase:meta. It notices when the 
merged region no
   * longer holds references to the old regions.
   */
  public void markRegionAsMerged(final RegionInfo child, final ServerName 
serverName,
  final RegionInfo mother, final RegionInfo father) throws IOException {
final RegionStateNode node = regionStates.getOrCreateRegionStateNode(child);
node.setState(State.MERGED);
regionStates.deleteRegion(mother);
regionStates.deleteRegion(father);
regionStateStore.mergeRegions(child, mother, father, serverName);
if (shouldAssignFavoredNodes(child)) {
  ((FavoredNodesPromoter)getBalancer()).
generateFavoredNodesForMergedRegion(child, mother, father);
}
  }
{code}

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a

[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-08 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357996#comment-16357996
 ] 

Duo Zhang commented on HBASE-19965:
---

I‘m not quite familiar with the AMv2 yet...

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19965) Fix flaky TestAsyncRegionAdminApi

2018-02-08 Thread Guanghao Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357980#comment-16357980
 ] 

Guanghao Zhang commented on HBASE-19965:


[~stack] [~Apache9] Any ideas about how to fix this?

> Fix flaky TestAsyncRegionAdminApi
> -
>
> Key: HBASE-19965
> URL: https://issues.apache.org/jira/browse/HBASE-19965
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Guanghao Zhang
>Priority: Major
>
> See 
> [https://builds.apache.org/job/HBase%20Nightly/job/branch-2/284/testReport/junit/org.apache.hadoop.hbase.client/TestAsyncRegionAdminApi/testMergeRegions_0_/]
>  
> java.lang.AssertionError: expected:<2> but was:<3> at 
> org.apache.hadoop.hbase.client.TestAsyncRegionAdminApi.testMergeRegions(TestAsyncRegionAdminApi.java:359)
>  
> Merge regions not work. The table still have 3 regions after the 
> MergeRegionsProcedure finished.
> The master start balance region 9e2773ba1efba79a2defa276e9a26ed4. But because 
> the MergeRegionsProcedure pid=138 start work first, so the balance need wait 
> for the lock. But after merge regions finished, the MoveRegionProcedure 
> pid=139 start work and assign 9e2773ba1efba79a2defa276e9a26ed4 to a new 
> region server. This is not right. The MoveRegionProcedure should skip to 
> assign a region which was marked as offline. Or we should clear the merged 
> regions' procedure when MergeRegionsProcedure finished.
>  
> Logs:
> 2018-02-08 16:24:44,608 INFO [master/cd4730e3eae2:0.Chore.1] 
> master.HMaster(1454): balance 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318
> 2018-02-08 16:24:44,608 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=4,queue=0,port=37885] 
> procedure2.ProcedureExecutor(868): Stored pid=138, 
> state=RUNNABLE:MERGE_TABLE_REGIONS_PREPARE; MergeTableRegionsProcedure 
> table=testMergeRegions, regions=[9e2773ba1efba79a2defa276e9a26ed4, 
> 8f8fd5cd032313e1aadb83e31e1b7479], forcibly=false
> ..
> 2018-02-08 16:24:50,111 INFO [PEWorker-13] 
> procedure2.ProcedureExecutor(1249): Finished pid=138, state=SUCCESS; 
> MergeTableRegionsProcedure table=testMergeRegions, 
> regions=[9e2773ba1efba79a2defa276e9a26ed4, 8f8fd5cd032313e1aadb83e31e1b7479], 
> forcibly=false in 5.5710sec
> 2018-02-08 16:24:50,113 INFO [PEWorker-13] 
> procedure.MasterProcedureScheduler(813): pid=139, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4., 
> source=cd4730e3eae2,39077,1518106776411, 
> destination=cd4730e3eae2,40578,1518106776318 testMergeRegions 
> testMergeRegions,,1518107079782.9e2773ba1efba79a2defa276e9a26ed4.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)