stack created HBASE-20366:
-----------------------------

             Summary: Procedure State != ProcedureState.RUNNABLE; 
IllegalArgumentException
                 Key: HBASE-20366
                 URL: https://issues.apache.org/jira/browse/HBASE-20366
             Project: HBase
          Issue Type: Bug
          Components: amv2
            Reporter: stack


PE Worker dies and Region offlined because Procedure not runable when procedure 
goes to run it. It looks like this:

{code}
2018-04-07 19:58:50,589 INFO  [PEWorker-5] procedure.MasterProcedureScheduler: 
pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
 source=ve0534.halxg.cloudera.com,16020,1523153184521, 
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 
187ee18fb3dd1a7ac1f9f2b667160729
2018-04-07 19:58:50,589 INFO  [PEWorker-14] procedure.MasterProcedureScheduler: 
pid=8302, state=RUNNABLE:MOVE_REGION_ASSIGN; MoveRegionProcedure 
hri=IntegrationTestBigLinkedList,\xEC0\x83\x96*\x86Qsh\xD82\x1E\xAB\x06$\x89,1523151456082.84e97ce42aeb78a2abaf8f17a278b735.,
 source=ve0534.halxg.cloudera.com,16020,1523153184521, 
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 
84e97ce42aeb78a2abaf8f17a278b735                                                
                                                                                
                                                        2018-04-07 19:58:50,591 
WARN  [PEWorker-5] procedure2.ProcedureExecutor: Worker terminating UNNATURALLY 
null
java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; 
MoveRegionProcedure 
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
 source=ve0534.halxg.cloudera.com,16020,1523153184521, 
destination=ve0542.halxg.cloudera.com,16020,1523155964184
  at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430)
                                                                                
                                                                                
                                                  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}

This killed my job because it offlined a region.

Narrative:

 * Balancer moves this region....
 * Move procedure does dispatch to unassign...
 * Suspiciously, the close comes in unannounced.. .its as though it a close 
from another procedure...

 2018-04-07 19:58:24,296 INFO  [PEWorker-9] assignment.RegionStateStore: 
pid=8305 updating hbase:meta 
row=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
 regionState=CLOSED
 * Master is killed by monkey.
 * Recovery. Region is in CLOSED state.
 * We go to schedule the move region procedure again... Its state must have not 
been updated on master crash.

 2018-04-07 19:58:50,589 INFO  [PEWorker-5] procedure.MasterProcedureScheduler: 
pid=8304, state=WAITING:MOVE_REGION_ASSIGN; MoveRegionProcedure 
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
 source=ve0534.halxg.cloudera.com,16020,1523153184521, 
destination=ve0542.halxg.cloudera.com,16020,1523155964184 checking lock on 
187ee18fb3dd1a7ac1f9f2b667160729

 * And then we get

 2018-04-07 19:58:50,591 WARN  [PEWorker-5] procedure2.ProcedureExecutor: 
Worker terminating UNNATURALLY null                                             
                                                                                
                                                                           
java.lang.IllegalArgumentException: pid=8304, state=WAITING:MOVE_REGION_ASSIGN; 
MoveRegionProcedure 
hri=IntegrationTestBigLinkedList,p\xC3\x11\xB2,1523155040553.187ee18fb3dd1a7ac1f9f2b667160729.,
 source=ve0534.halxg.cloudera.com,16020,1523153184521, 
destination=ve0542.halxg.cloudera.com,16020,1523155964184   at 
org.apache.hbase.thirdparty.com.google.common.base.Preconditions.checkArgument(Preconditions.java:134)
                                                                                
                                                                                
                                           at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1430)
                                                                                
                                                                                
                                                  at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
                                                                                
                                                                                
                                               at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
                                                                                
                                                                                
                                                       at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to