[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-20 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20706:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20706.001.branch-2.0.patch, 
> HBASE-20706.002.branch-2.0.patch, HBASE-20706.003.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-20 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20706:
--
Fix Version/s: (was: 2.0.1)
   2.0.2

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.2
>
> Attachments: HBASE-20706.001.branch-2.0.patch, 
> HBASE-20706.002.branch-2.0.patch, HBASE-20706.003.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-14 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20706:
---
Attachment: HBASE-20706.003.branch-2.0.patch

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20706.001.branch-2.0.patch, 
> HBASE-20706.002.branch-2.0.patch, HBASE-20706.003.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-13 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20706:
---
Attachment: HBASE-20706.002.branch-2.0.patch

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20706.001.branch-2.0.patch, 
> HBASE-20706.002.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-08 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20706:
---
Status: Patch Available  (was: Open)

Big thanks for [~sergey.soldatov] and [~yuzhih...@gmail.com] for helping out in 
the debugging of this.

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20706.001.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20706) [hack] Don't add known not-OPEN regions in reopen phase of MTP

2018-06-08 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated HBASE-20706:
---
Attachment: HBASE-20706.001.branch-2.0.patch

> [hack] Don't add known not-OPEN regions in reopen phase of MTP
> --
>
> Key: HBASE-20706
> URL: https://issues.apache.org/jira/browse/HBASE-20706
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: HBASE-20706.001.branch-2.0.patch
>
>
> Shake-down of ModifyTableProcedure, talked this one out with Stack – "proper" 
> fix is likely pending in HBASE-20682. Using MoveRegionProcedure is likely the 
> wrong construct, we would want something specific to reopen (e.g. a 
> ReopenProcedure).
> However, we're in a really bad state right now. If there are non-open regions 
> for a table which has a modify submitted against it, the entire system locks 
> up in a fast-spin while holding the table's lock. This fills up HDFS with PV2 
> wals, and prevents you from doing anything in the hbase shell to try to fix 
> those unassigned regions. You'll see spam in the master log like:
> {noformat}
> 2018-06-07 03:21:29,448 WARN  [PEWorker-1] procedure.ModifyTableProcedure: 
> Retriable error trying to modify table=METRIC_RECORD_HOURLY_UUID (in 
> state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
> a3dc333606d38aeb6e2ab4b94233cfbc is not OPEN
>     at 
> org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
>     at 
> org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
>     at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
>     at 
> org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
>     at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
>     at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
>     at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
> {noformat}
> We unstuck out internal test cluster giving the following change on top of 
> Sergey's HBASE-20657. When choosing the regions to reopen, if we filter out a 
> table's regions to only be those which are currently OPEN. There may be some 
> transient failures here as well, but a subsequent retry of the reopen step 
> should filter out that change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)