[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments

2019-07-05 Thread Xiaolin Ha (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878993#comment-16878993
 ] 

Xiaolin Ha commented on HBASE-20728:


I can reproduce this error, by steps:
 # add more than one servers to a rsgroup,
 # move table to this rsgroup,
 # move all table regions to one server of this rsgroup (this is important, to 
make definitely region's 'lastHost' in rsgroup, or maybe in other group),
 # stop all the region servers in this rsgroup (better wait a while),
 # restart servers in this rsgroup,
 # rit stuck appears, and rs name in the {{RIT}} message has the old timestamp, 
logs like:  WARN [ProcExecTimeout] assignment.AssignmentManager(1328): STUCK 
Region-In-Transition rit=OPEN, location=localhost,32843,1562307050191, 
table=Group_testKillAllRSInGroupAndThenAddNew, 
region=a763499801435d2f78ab42876c6cb3ec
 # if change step 5 by add a new server to this rsgroup, the RIT message in 
step 6 should has old rs info.

ROOT cause of this problem is the same as HBASE-20368. We discussed at: 
https://github.com/apache/hbase/pull/354

 

 

 

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> 
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Reporter: Biju Nair
>Assignee: Sakthi
>Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments

2018-08-13 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578930#comment-16578930
 ] 

Sakthi commented on HBASE-20728:


Was able to recreate the scenario

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> 
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Reporter: Biju Nair
>Assignee: Sakthi
>Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments

2018-06-25 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522714#comment-16522714
 ] 

Sakthi commented on HBASE-20728:


Sure [~apurtell]

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> 
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Reporter: Biju Nair
>Assignee: Sakthi
>Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments

2018-06-25 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522713#comment-16522713
 ] 

Andrew Purtell commented on HBASE-20728:


[~jatsakthi] Please go ahead. It looks ok to work on this.

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> 
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Reporter: Biju Nair
>Assignee: Sakthi
>Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments

2018-06-15 Thread Sakthi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514416#comment-16514416
 ] 

Sakthi commented on HBASE-20728:


Hey [~gsbiju] , do you mind if I work on this?

> Failure and recovery of all RSes in a RSgroup requires master restart for 
> region assignments
> 
>
> Key: HBASE-20728
> URL: https://issues.apache.org/jira/browse/HBASE-20728
> Project: HBase
>  Issue Type: Bug
>  Components: master, rsgroup
>Reporter: Biju Nair
>Priority: Minor
>
> If all the RSes in a RSgroup hosting user tables fail and recover, master 
> still looks for old RSes (with old timestamp in the RS identifier) to assign 
> regions. i.e. Regions are left in transition making the tables in the RSGroup 
> unavailable. User need to restart {{master}} or manually assign the regions 
> to make the tables available. Steps to recreate the scenario in a local 
> cluster
>  - Add required properties to {{site.xml}} to enable {{rsgroup}} and start 
> hbase
>  - Bring up multiple region servers using {{local-regionservers.sh start}}
>  - Create a {{rsgroup}} and move a subset of  {{regionservers}} to the group
>  - Create a table, move it to the group and put some data
>  - Stop the {{regionservers}} in the group and restart them
>  - From the {{master UI}}, we can see that the region for the table in 
> transition and the RS name in the {{RIT}} message has the old timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)