[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments
[ https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16878993#comment-16878993 ] Xiaolin Ha commented on HBASE-20728: I can reproduce this error, by steps: # add more than one servers to a rsgroup, # move table to this rsgroup, # move all table regions to one server of this rsgroup (this is important, to make definitely region's 'lastHost' in rsgroup, or maybe in other group), # stop all the region servers in this rsgroup (better wait a while), # restart servers in this rsgroup, # rit stuck appears, and rs name in the {{RIT}} message has the old timestamp, logs like: WARN [ProcExecTimeout] assignment.AssignmentManager(1328): STUCK Region-In-Transition rit=OPEN, location=localhost,32843,1562307050191, table=Group_testKillAllRSInGroupAndThenAddNew, region=a763499801435d2f78ab42876c6cb3ec # if change step 5 by add a new server to this rsgroup, the RIT message in step 6 should has old rs info. ROOT cause of this problem is the same as HBASE-20368. We discussed at: https://github.com/apache/hbase/pull/354 > Failure and recovery of all RSes in a RSgroup requires master restart for > region assignments > > > Key: HBASE-20728 > URL: https://issues.apache.org/jira/browse/HBASE-20728 > Project: HBase > Issue Type: Bug > Components: master, rsgroup >Reporter: Biju Nair >Assignee: Sakthi >Priority: Minor > > If all the RSes in a RSgroup hosting user tables fail and recover, master > still looks for old RSes (with old timestamp in the RS identifier) to assign > regions. i.e. Regions are left in transition making the tables in the RSGroup > unavailable. User need to restart {{master}} or manually assign the regions > to make the tables available. Steps to recreate the scenario in a local > cluster > - Add required properties to {{site.xml}} to enable {{rsgroup}} and start > hbase > - Bring up multiple region servers using {{local-regionservers.sh start}} > - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group > - Create a table, move it to the group and put some data > - Stop the {{regionservers}} in the group and restart them > - From the {{master UI}}, we can see that the region for the table in > transition and the RS name in the {{RIT}} message has the old timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments
[ https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578930#comment-16578930 ] Sakthi commented on HBASE-20728: Was able to recreate the scenario > Failure and recovery of all RSes in a RSgroup requires master restart for > region assignments > > > Key: HBASE-20728 > URL: https://issues.apache.org/jira/browse/HBASE-20728 > Project: HBase > Issue Type: Bug > Components: master, rsgroup >Reporter: Biju Nair >Assignee: Sakthi >Priority: Minor > > If all the RSes in a RSgroup hosting user tables fail and recover, master > still looks for old RSes (with old timestamp in the RS identifier) to assign > regions. i.e. Regions are left in transition making the tables in the RSGroup > unavailable. User need to restart {{master}} or manually assign the regions > to make the tables available. Steps to recreate the scenario in a local > cluster > - Add required properties to {{site.xml}} to enable {{rsgroup}} and start > hbase > - Bring up multiple region servers using {{local-regionservers.sh start}} > - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group > - Create a table, move it to the group and put some data > - Stop the {{regionservers}} in the group and restart them > - From the {{master UI}}, we can see that the region for the table in > transition and the RS name in the {{RIT}} message has the old timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments
[ https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522714#comment-16522714 ] Sakthi commented on HBASE-20728: Sure [~apurtell] > Failure and recovery of all RSes in a RSgroup requires master restart for > region assignments > > > Key: HBASE-20728 > URL: https://issues.apache.org/jira/browse/HBASE-20728 > Project: HBase > Issue Type: Bug > Components: master, rsgroup >Reporter: Biju Nair >Assignee: Sakthi >Priority: Minor > > If all the RSes in a RSgroup hosting user tables fail and recover, master > still looks for old RSes (with old timestamp in the RS identifier) to assign > regions. i.e. Regions are left in transition making the tables in the RSGroup > unavailable. User need to restart {{master}} or manually assign the regions > to make the tables available. Steps to recreate the scenario in a local > cluster > - Add required properties to {{site.xml}} to enable {{rsgroup}} and start > hbase > - Bring up multiple region servers using {{local-regionservers.sh start}} > - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group > - Create a table, move it to the group and put some data > - Stop the {{regionservers}} in the group and restart them > - From the {{master UI}}, we can see that the region for the table in > transition and the RS name in the {{RIT}} message has the old timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments
[ https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522713#comment-16522713 ] Andrew Purtell commented on HBASE-20728: [~jatsakthi] Please go ahead. It looks ok to work on this. > Failure and recovery of all RSes in a RSgroup requires master restart for > region assignments > > > Key: HBASE-20728 > URL: https://issues.apache.org/jira/browse/HBASE-20728 > Project: HBase > Issue Type: Bug > Components: master, rsgroup >Reporter: Biju Nair >Assignee: Sakthi >Priority: Minor > > If all the RSes in a RSgroup hosting user tables fail and recover, master > still looks for old RSes (with old timestamp in the RS identifier) to assign > regions. i.e. Regions are left in transition making the tables in the RSGroup > unavailable. User need to restart {{master}} or manually assign the regions > to make the tables available. Steps to recreate the scenario in a local > cluster > - Add required properties to {{site.xml}} to enable {{rsgroup}} and start > hbase > - Bring up multiple region servers using {{local-regionservers.sh start}} > - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group > - Create a table, move it to the group and put some data > - Stop the {{regionservers}} in the group and restart them > - From the {{master UI}}, we can see that the region for the table in > transition and the RS name in the {{RIT}} message has the old timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20728) Failure and recovery of all RSes in a RSgroup requires master restart for region assignments
[ https://issues.apache.org/jira/browse/HBASE-20728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514416#comment-16514416 ] Sakthi commented on HBASE-20728: Hey [~gsbiju] , do you mind if I work on this? > Failure and recovery of all RSes in a RSgroup requires master restart for > region assignments > > > Key: HBASE-20728 > URL: https://issues.apache.org/jira/browse/HBASE-20728 > Project: HBase > Issue Type: Bug > Components: master, rsgroup >Reporter: Biju Nair >Priority: Minor > > If all the RSes in a RSgroup hosting user tables fail and recover, master > still looks for old RSes (with old timestamp in the RS identifier) to assign > regions. i.e. Regions are left in transition making the tables in the RSGroup > unavailable. User need to restart {{master}} or manually assign the regions > to make the tables available. Steps to recreate the scenario in a local > cluster > - Add required properties to {{site.xml}} to enable {{rsgroup}} and start > hbase > - Bring up multiple region servers using {{local-regionservers.sh start}} > - Create a {{rsgroup}} and move a subset of {{regionservers}} to the group > - Create a table, move it to the group and put some data > - Stop the {{regionservers}} in the group and restart them > - From the {{master UI}}, we can see that the region for the table in > transition and the RS name in the {{RIT}} message has the old timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005)