[jira] [Updated] (HBASE-17625) Slow to enable a table

2017-02-10 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-17625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-17625:

Description: 
Tried to enable a table with 10k+ regions, it takes more time to generate the 
plan than do the actual assignment. This is so embarrassing. :)

It turns out that it took quite some time to get the top HDFS block locations 
in registering regions when creating the Cluster object.


  was:
Tried to enable a table with 10k+ regions, it takes more time to generate the 
plan than do the actual assignment. This is so embarrassing. :)

It turns out that it took quite some time to get the top HDFS block locations 
in registering regions when creating the Cluster object. There is no new region 
server, why do we need such info when trying to retain assignment?

Is the region availability thing related to region replica? Can we avoid such 
penalty if  region replica in not needed?


> Slow to enable a table
> --
>
> Key: HBASE-17625
> URL: https://issues.apache.org/jira/browse/HBASE-17625
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>
> Tried to enable a table with 10k+ regions, it takes more time to generate the 
> plan than do the actual assignment. This is so embarrassing. :)
> It turns out that it took quite some time to get the top HDFS block locations 
> in registering regions when creating the Cluster object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17625) Slow to enable a table

2017-02-10 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-17625:
---

 Summary: Slow to enable a table
 Key: HBASE-17625
 URL: https://issues.apache.org/jira/browse/HBASE-17625
 Project: HBase
  Issue Type: Improvement
Reporter: Jimmy Xiang


Tried to enable a table with 10k+ regions, it takes more time to generate the 
plan than do the actual assignment. This is so embarrassing. :)

It turns out that it took quite some time to get the top HDFS block locations 
in registering regions when creating the Cluster object. There is no new region 
server, why do we need such info when trying to retain assignment?

Is the region availability thing related to region replica? Can we avoid such 
penalty if  region replica in not needed?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-11611) Clean up ZK-based region assignment

2015-07-31 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14649386#comment-14649386
 ] 

Jimmy Xiang commented on HBASE-11611:
-

This change is in the trunk branch (2.0). We can not upgrade 0.94 to 2.0 
directly. One upgrade path could be 0.94 -> 0.96/0.98 -> 1.0 -> 2.0.

> Clean up ZK-based region assignment
> ---
>
> Key: HBASE-11611
> URL: https://issues.apache.org/jira/browse/HBASE-11611
> Project: HBase
>  Issue Type: Improvement
>  Components: Region Assignment
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0
>
> Attachments: hbase-11611.addendum, hbase-11611.patch, 
> hbase-11611_v1.patch, hbase-11611_v2.patch
>
>
> We can clean up the ZK-based region assignment code and use the ZK-less one 
> in the master branch, to make the code easier to understand and maintain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers

2015-05-04 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527748#comment-14527748
 ] 

Jimmy Xiang commented on HBASE-13605:
-

The patch is fine with me for the master branch. For branch 1, I am not sure.

> RegionStates should not keep its list of dead servers
> -
>
> Key: HBASE-13605
> URL: https://issues.apache.org/jira/browse/HBASE-13605
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.0.2, 1.1.1
>
> Attachments: hbase-13605_v1.patch
>
>
> As mentioned in 
> https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
>  and HBASE-12844 we should have only 1 source of cluster membership. 
> The list of dead server and RegionStates doing it's own liveliness check 
> (ServerManager.isServerReachable()) has caused an assignment problem again in 
> a test cluster where the region states "thinks" that the server is dead and 
> SSH will handle the region assignment. However the RS is not dead at all, 
> living happily, and never gets zk expiry or YouAreDeadException or anything. 
> This leaves the list of regions unassigned in OFFLINE state. 
> master assigning the region:
> {code}
> 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: 
> Onlined 77dddcd50c22e56bfff133c0e1f9165b on 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => 
> 77dddcd50c
> {code}
> Master then disabled the table, and unassigned the region:
> {code}
> 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] 
> zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING 
> to DISABLING
>  Starting unassign of 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), 
> current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, 
> ts=1429520545780,   
> server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
> bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
> 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: 
> Offlined 77dddcd50c22e56bfff133c0e1f9165b from 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}
> On table re-enable, AM does not assign the region: 
> {code}
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart 
> assignment.·
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 
> server(s), retainAssignment=true
> l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't 
> reach online server 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: 
> Updating the state to OFFLINE to allow to be reassigned by SSH
> nmentManager: Skip assigning 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead 
> but not processed yet server: 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers

2015-05-04 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526944#comment-14526944
 ] 

Jimmy Xiang commented on HBASE-13605:
-

I was confused the dead server list with the processed server list. In ZK-less 
region assignment,  we don't need to ping the sever any more, since the server 
could be restarted right after the ping returns, so we should not rely on the 
ping result, i.e. the ping result is as good as DeadServer.

> RegionStates should not keep its list of dead servers
> -
>
> Key: HBASE-13605
> URL: https://issues.apache.org/jira/browse/HBASE-13605
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.0.2, 1.1.1
>
> Attachments: hbase-13605_v1.patch
>
>
> As mentioned in 
> https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
>  and HBASE-12844 we should have only 1 source of cluster membership. 
> The list of dead server and RegionStates doing it's own liveliness check 
> (ServerManager.isServerReachable()) has caused an assignment problem again in 
> a test cluster where the region states "thinks" that the server is dead and 
> SSH will handle the region assignment. However the RS is not dead at all, 
> living happily, and never gets zk expiry or YouAreDeadException or anything. 
> This leaves the list of regions unassigned in OFFLINE state. 
> master assigning the region:
> {code}
> 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: 
> Onlined 77dddcd50c22e56bfff133c0e1f9165b on 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => 
> 77dddcd50c
> {code}
> Master then disabled the table, and unassigned the region:
> {code}
> 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] 
> zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING 
> to DISABLING
>  Starting unassign of 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), 
> current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, 
> ts=1429520545780,   
> server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
> bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
> 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: 
> Offlined 77dddcd50c22e56bfff133c0e1f9165b from 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}
> On table re-enable, AM does not assign the region: 
> {code}
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart 
> assignment.·
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 
> server(s), retainAssignment=true
> l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't 
> reach online server 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: 
> Updating the state to OFFLINE to allow to be reassigned by SSH
> nmentManager: Skip assigning 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead 
> but not processed yet server: 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13605) RegionStates should not keep its list of dead servers

2015-05-01 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14523986#comment-14523986
 ] 

Jimmy Xiang commented on HBASE-13605:
-

The dead server list in RegionStates is a list of servers that are dead and 
have been processed by SSH. It is used by AM to make sure regions are not 
assigned before SSH has finished log-splitting.  It is critical to make sure 
there is no data loss.

RegionStates does not know if a server is dead unless SSH tells it. If the 
server is not dead, but it is on the dead server list in the RegionStates, this 
is possible only if your cluster has some time sync issue. I was wondering how 
this actually happened. There may be some clue in the log.

> RegionStates should not keep its list of dead servers
> -
>
> Key: HBASE-13605
> URL: https://issues.apache.org/jira/browse/HBASE-13605
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
> Fix For: 2.0.0, 1.0.2, 1.1.1
>
> Attachments: hbase-13605_v1.patch
>
>
> As mentioned in 
> https://issues.apache.org/jira/browse/HBASE-9514?focusedCommentId=13769761&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13769761
>  and HBASE-12844 we should have only 1 source of cluster membership. 
> The list of dead server and RegionStates doing it's own liveliness check 
> (ServerManager.isServerReachable()) has caused an assignment problem again in 
> a test cluster where the region states "thinks" that the server is dead and 
> SSH will handle the region assignment. However the RS is not dead at all, 
> living happily, and never gets zk expiry or YouAreDeadException or anything. 
> This leaves the list of regions unassigned in OFFLINE state. 
> master assigning the region:
> {code}
> 15-04-20 09:02:25,780 DEBUG [AM.ZK.Worker-pool3-t330] master.RegionStates: 
> Onlined 77dddcd50c22e56bfff133c0e1f9165b on 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 {ENCODED => 
> 77dddcd50c
> {code}
> Master then disabled the table, and unassigned the region:
> {code}
> 2015-04-20 09:02:27,158 WARN  [ProcedureExecutorThread-1] 
> zookeeper.ZKTableStateManager: Moving table loadtest_d1 state from DISABLING 
> to DISABLING
>  Starting unassign of 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b. (offlining), 
> current state: {77dddcd50c22e56bfff133c0e1f9165b state=OPEN, 
> ts=1429520545780,   
> server=os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268}
> bleProcedure$BulkDisabler-0] master.AssignmentManager: Sent CLOSE to 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268 for region 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b.
> 2015-04-20 09:02:27,414 INFO  [AM.ZK.Worker-pool3-t316] master.RegionStates: 
> Offlined 77dddcd50c22e56bfff133c0e1f9165b from 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}
> On table re-enable, AM does not assign the region: 
> {code}
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> balancer.BaseLoadBalancer: Reassigned 25 regions. 25 retained the pre-restart 
> assignment.·
> 2015-04-20 09:02:30,415 INFO  [ProcedureExecutorThread-3] 
> procedure.EnableTableProcedure: Bulk assigning 25 region(s) across 5 
> server(s), retainAssignment=true
> l,16000,1429515659726-GeneralBulkAssigner-4] master.RegionStates: Couldn't 
> reach online server 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> l,16000,1429515659726-GeneralBulkAssigner-4] master.AssignmentManager: 
> Updating the state to OFFLINE to allow to be reassigned by SSH
> nmentManager: Skip assigning 
> loadtest_d1,,1429520544378.77dddcd50c22e56bfff133c0e1f9165b., it is on a dead 
> but not processed yet server: 
> os-amb-r6-us-1429512014-hbase4-6.novalocal,16020,1429520535268
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-02 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393279#comment-14393279
 ] 

Jimmy Xiang commented on HBASE-13337:
-

I looked into it again and tried to reproduce it. Whenever a server is 
restarted, we get lots of java.nio.channels.ClosedChannelException. This is 
something new. So I tried with some old code, and  found out there is no such 
problem with the master branch commit 1723245282ba39567f7da4234cdd31ba534cb869 
(Add in an hbasecon2015 logo for the banner).

This seems not to be a problem with assignment. I am not sure if branch-1 is 
affected now.

It looks like this has something to do with the aysc RPC changes. Connections 
seem not to be able to recover from server restarts.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 IN

[jira] [Updated] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-02 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-13337:

Assignee: (was: Jimmy Xiang)

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> 

[jira] [Updated] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-02 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-13337:

Status: Open  (was: Patch Available)

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Jimmy Xiang
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-02 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14392835#comment-14392835
 ] 

Jimmy Xiang commented on HBASE-13337:
-

Branch-1 is affected if ZK-less assignment is turned on.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Jimmy Xiang
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b35673605

[jira] [Updated] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-13337:

Status: Patch Available  (was: Open)

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Jimmy Xiang
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1

[jira] [Updated] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-13337:

Attachment: HBASE-13337.patch

Attached a patch that checks for such a scenario in case a region is forced to 
assign with a new plan. [~sreenivasulureddy], could you give it a try?

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Jimmy Xiang
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-13337.patch
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.Cl

[jira] [Assigned] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-04-01 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HBASE-13337:
---

Assignee: Jimmy Xiang

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Assignee: Jimmy Xiang
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.Assig

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-03-29 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385920#comment-14385920
 ] 

Jimmy Xiang commented on HBASE-13337:
-

Thanks a lot for verifying it.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssign

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-03-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384523#comment-14384523
 ] 

Jimmy Xiang commented on HBASE-13337:
-

By the way, as a work-around, if you restart the master as step 6, all regions 
should be assigned as expected.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=4

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-03-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384489#comment-14384489
 ] 

Jimmy Xiang commented on HBASE-13337:
-

For graceful shutdown, no need for log splitting. Those regions on the dead 
server are still re-assigned by SSH if the master is not restarted.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e81

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-03-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384394#comment-14384394
 ] 

Jimmy Xiang commented on HBASE-13337:
-

Or master doesn't know it soon enough. It is a racing between SSH and master ZK 
event handling (for regionserver is gone). Possible fixes could be (1) fail the 
region assignments and re-queue the dead server for SSH, (2) fail the master 
process (by shutting down itself) if such a scenario is detected.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssi

[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.

2015-03-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384353#comment-14384353
 ] 

Jimmy Xiang commented on HBASE-13337:
-

At step 4, all the region servers are down, master needs to split log and 
recover those regions. In ServerShutdownHandler, we have
{noformat}
  while (!this.server.isStopped() && 
serverManager.countOfRegionServers() < 2) {
{noformat}
to wait till some regionserver joins in before assigning any region. 

This looks like a racing issue. Although all regionservers are down, master 
doesn't know it yet. So SSH starts to assign regions, which fails.

> Table regions are not assigning back, after restarting all regionservers at 
> once.
> -
>
> Key: HBASE-13337
> URL: https://issues.apache.org/jira/browse/HBASE-13337
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0
>Reporter: Y. SREENIVASULU REDDY
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Regions of the table are continouly in state=FAILED_CLOSE.
> {noformat}
> RegionState   
>   
>   RIT time (ms)
> 8f62e819b356736053e06240f7f7c6fd  
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113929
> caf59209ae65ea80fca6bdc6996a7d68  
> t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM2,16040,1427362533691  113929
> db52a74988f71e5cf257bbabf31f26f3  
> t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM3,16040,1427362533691  113920
> 43f3a65b9f9ff283f598c5450feab1f8  
> t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. 
> state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), 
> server=VM1,16040,1427362531818  113920
> {noformat}
> *Steps to reproduce:*
> 1. Start HBase cluster with more than one regionserver.
> 2. Create a table with precreated regions. (lets say 15 regions)
> 3. Make sure the regions are well balanced.
> 4. Restart all the Regionservers process at once across the cluster, except 
> HMaster process
> 5. After restarting the Regionservers, successfully will connect to the 
> HMaster.
> *Bug:*
> But no regions are assigning back to the Regionservers.
> *Master log shows as follows:*
> {noformat}
> 2015-03-26 15:05:36,201 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,202 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_OPEN&sn=VM1,16040,1427362531818
> 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Force region state offline 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd 
> state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to 
> {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, 
> server=VM1,16040,1427362531818}
> 2015-03-26 15:05:36,244 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.RegionStateStore: Updating row 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with 
> state=PENDING_CLOSE
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10
> 2015-03-26 15:05:36,248 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException for 
> t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10
> 2015-03-26 15:05:36,249 INFO  [VM2,16020,1427362216887-GeneralBulkAssigner-0] 
> master.AssignmentManager: Server VM1,16040,1427362531818 returned 
> java.nio.channels.ClosedChannelException

[jira] [Commented] (HBASE-13194) TableNamespaceManager not ready cause MasterQuotaManager initialization fail

2015-03-12 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358931#comment-14358931
 ] 

Jimmy Xiang commented on HBASE-13194:
-

Looks good to me.

> TableNamespaceManager not ready cause MasterQuotaManager initialization fail 
> -
>
> Key: HBASE-13194
> URL: https://issues.apache.org/jira/browse/HBASE-13194
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0
>Reporter: zhangduo
>Assignee: zhangduo
> Fix For: 2.0.0
>
> Attachments: HBASE-13194.patch
>
>
> This cause TestNamespaceAuditor to fail.
> https://builds.apache.org/job/HBase-TRUNK/6237/testReport/junit/org.apache.hadoop.hbase.namespace/TestNamespaceAuditor/testRegionOperations/
> {noformat}
> 2015-03-10 22:42:01,372 ERROR [hemera:48616.activeMasterManager] 
> namespace.NamespaceStateManager(204): Error while update namespace state.
> java.io.IOException: Table Namespace Manager not ready yet, try again later
>   at 
> org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1912)
>   at 
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:2131)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.initialize(NamespaceStateManager.java:188)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.start(NamespaceStateManager.java:63)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.start(NamespaceAuditor.java:57)
>   at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.start(MasterQuotaManager.java:88)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initQuotaManager(HMaster.java:902)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:756)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:161)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1455)
>   at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The direct reason is that we do not have a retry here, if init fails then it 
> always fails. But I skimmed the code, seems there is no async init operations 
> when calling finishActiveMasterInitialization, so it is very strange. Need to 
> dig more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13194) TableNamespaceManager not ready cause MasterQuotaManager initialization fail

2015-03-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357906#comment-14357906
 ] 

Jimmy Xiang commented on HBASE-13194:
-

The code doesn't match the comment any more, right? If failed to init namespace 
table, we throw an exception now. It used to not throw an exception when the 
namespace table is just introduced.

> TableNamespaceManager not ready cause MasterQuotaManager initialization fail 
> -
>
> Key: HBASE-13194
> URL: https://issues.apache.org/jira/browse/HBASE-13194
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: zhangduo
>
> This cause TestNamespaceAuditor to fail.
> https://builds.apache.org/job/HBase-TRUNK/6237/testReport/junit/org.apache.hadoop.hbase.namespace/TestNamespaceAuditor/testRegionOperations/
> {noformat}
> 2015-03-10 22:42:01,372 ERROR [hemera:48616.activeMasterManager] 
> namespace.NamespaceStateManager(204): Error while update namespace state.
> java.io.IOException: Table Namespace Manager not ready yet, try again later
>   at 
> org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1912)
>   at 
> org.apache.hadoop.hbase.master.HMaster.listNamespaceDescriptors(HMaster.java:2131)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.initialize(NamespaceStateManager.java:188)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.start(NamespaceStateManager.java:63)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.start(NamespaceAuditor.java:57)
>   at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.start(MasterQuotaManager.java:88)
>   at 
> org.apache.hadoop.hbase.master.HMaster.initQuotaManager(HMaster.java:902)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:756)
>   at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:161)
>   at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1455)
>   at java.lang.Thread.run(Thread.java:744)
> {noformat}
> The direct reason is that we do not have a retry here, if init fails then it 
> always fails. But I skimmed the code, seems there is no async init operations 
> when calling finishActiveMasterInitialization, so it is very strange. Need to 
> dig more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13172) TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1

2015-03-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353452#comment-14353452
 ] 

Jimmy Xiang commented on HBASE-13172:
-

+1. Looks good to me. As to the issue [~jeffreyz] pointed out, that part is 
needed.  It is preferred that a RS dies naturally (means per ZK) instead of 
marked dead by AM. Call isServerReachable should not return false info after 
retries since we check the start code, if the retries take longer the ZK 
session time-out time.

> TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1
> 
>
> Key: HBASE-13172
> URL: https://issues.apache.org/jira/browse/HBASE-13172
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: zhangduo
>Assignee: zhangduo
> Attachments: HBASE-13172-branch-1.patch
>
>
> The direct reason is we are stuck in ServerManager.isServerReachable.
> https://builds.apache.org/job/HBase-1.1/253/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/
> {noformat}
> 2015-03-06 04:06:19,430 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=0 of 10
> 2015-03-06 04:07:10,545 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10
> {noformat}
> The interval between first and last retry log is about 1 minute, and we only 
> wait 1 minute so the test is timeout.
> Still do not know why this happen.
> And at last there are lots of this 
> {noformat}
> 2015-03-06 04:07:21,529 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.getConnection(RpcClientImpl.java:1261)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1146)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.getServerInfo(AdminProtos.java:22031)
>   at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getServerInfo(ProtobufUtil.java:1797)
>   at 
> org.apache.hadoop.hbase.master.ServerManager.isServerReachable(ServerManager.java:850)
>   at 
> org.apache.hadoop.hbase.master.RegionStates.isServerDeadAndNotProcessed(RegionStates.java:843)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1969)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1576)
>   at 
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:48)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {noformat}
> I think the problem is here
> {code:title=ServerManager.java}
> while (retryCounter.shouldRetry()) {
> ...
> try {
>   retryCounter.sleepUntilNextRetry();
> } catch(InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
> ...
> }
> {code}
> We need to break out of the while loop when getting InterruptedException, not 
> just mark current thread as interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13172) TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1

2015-03-08 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14352339#comment-14352339
 ] 

Jimmy Xiang commented on HBASE-13172:
-

Some tests at branch-1 are more flaky than in master because we may kill RS 
holding meta which takes longer to recover.  In master, there is no such issue 
since meta is on master all the time. This also means it is usually a bug if 
some assignment related test is flaky in master. For branch-1, it is a little 
complicated.

You are right this test is not meant to test region assignment. If we can 
assure the 3 RS killed don't hold meta, the test may not be that flaky. We can 
have another test for meta handling if there is not such a testcase already.

> TestDistributedLogSplitting.testThreeRSAbort fails several times on branch-1
> 
>
> Key: HBASE-13172
> URL: https://issues.apache.org/jira/browse/HBASE-13172
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.1.0
>Reporter: zhangduo
>
> The direct reason is we are stuck in ServerManager.isServerReachable.
> https://builds.apache.org/job/HBase-1.1/253/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/
> {noformat}
> 2015-03-06 04:06:19,430 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=0 of 10
> 2015-03-06 04:07:10,545 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10
> {noformat}
> The interval between first and last retry log is about 1 minute, and we only 
> wait 1 minute so the test is timeout.
> Still do not know why this happen.
> And at last there are lots of this 
> {noformat}
> 2015-03-06 04:07:21,529 DEBUG [AM.-pool300-t1] master.ServerManager(855): 
> Couldn't reach asf906.gq1.ygridcore.net,59366,1425614770146, try=9 of 10
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.getConnection(RpcClientImpl.java:1261)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1146)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.getServerInfo(AdminProtos.java:22031)
>   at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getServerInfo(ProtobufUtil.java:1797)
>   at 
> org.apache.hadoop.hbase.master.ServerManager.isServerReachable(ServerManager.java:850)
>   at 
> org.apache.hadoop.hbase.master.RegionStates.isServerDeadAndNotProcessed(RegionStates.java:843)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.forceRegionStateToOffline(AssignmentManager.java:1969)
>   at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1576)
>   at 
> org.apache.hadoop.hbase.master.AssignCallable.call(AssignCallable.java:48)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> {noformat}
> I think the problem is here
> {code:title=ServerManager.java}
> while (retryCounter.shouldRetry()) {
> ...
> try {
>   retryCounter.sleepUntilNextRetry();
> } catch(InterruptedException ie) {
>   Thread.currentThread().interrupt();
> }
> ...
> }
> {code}
> We need to break out of the while loop when getting InterruptedException, not 
> just mark current thread as interrupted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13150) TestMasterObserver failing disable table at end of test

2015-03-05 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349319#comment-14349319
 ] 

Jimmy Xiang commented on HBASE-13150:
-

Good analysing. I think we are good to remove that part as Andrey did in 
HBASE-13076.  We should not change (write) a table state in assigning any 
region. We only need to check (read) the state instead.

> TestMasterObserver failing disable table at end of test
> ---
>
> Key: HBASE-13150
> URL: https://issues.apache.org/jira/browse/HBASE-13150
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
>
> I see in 
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-TRUNK/6202/testReport/junit/org.apache.hadoop.hbase.coprocessor/TestMasterObserver/testRegionTransitionOperations/
>   , now we have added in timeouts, that we are failing to disable a table. It 
> looks like table is disabled but regions are being opened on the disabled 
> table still, like HBASE-6537
> Let me see if can figure why this happening. Will be back.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13076) Table can be forcibly enabled in AssignmentManager during table disabling.

2015-03-05 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349313#comment-14349313
 ] 

Jimmy Xiang commented on HBASE-13076:
-

I remember that the code removed in this patch is initially introduced in 
HBASE-5155 (later on changed a little in HBASE-6229). At that time, we had a 
some problem to tell if a table is enabled if it is already enabled.

For this issue, I think we can remove the code, or fail the assignment if the 
table is not enabled/enabling.  I prefer to remove the code since the table 
state is checked later anyway (and the change is simpler/safer). (Note: If we 
fail the assignment now, it is good, but we need to update the state 
accordingly. That's some enhancement. If this doesn't happen a lot, we may not 
need the enhancement.)

> Table can be forcibly enabled in AssignmentManager during table disabling.
> --
>
> Key: HBASE-13076
> URL: https://issues.apache.org/jira/browse/HBASE-13076
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Affects Versions: 2.0.0
>Reporter: Andrey Stepachev
>Assignee: Andrey Stepachev
> Attachments: 23757f039d83f4f17ca18815eae70b28.log, HBASE-13076.patch
>
>
> Got situation where region can be opened while table is disabling by 
> DisableTableHandler. Here is relevant log for such situation. There is no 
> clues who issued OPEN to region.
> Log file attached.
> UPD: A bit more details. It seems that even in case of new state put into 
> meta, it still possible to get previous state.
> That leads to one more round of assignment invoked in 
> AssignmentManager#onRegionClosed.
> UPD: Table become ENABLED, thats leads to regions instructed to assign 
> immediately on onRegionClosed. BulkDisabler will not know about that and will 
> wait indefinitely, because it will not issue unassign for newly opened 
> regions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13076) Table can be forcibly enabled in AssignmentManager during table disabling.

2015-02-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333533#comment-14333533
 ] 

Jimmy Xiang commented on HBASE-13076:
-

Looks like table state is out of sync with region states. This patch probably 
doesn't fix the problem. However, if table state is persisted in meta 
(including DISABLING and DISABLED tables?), it is good for this patch to remove 
the dead code.

> Table can be forcibly enabled in AssignmentManager during table disabling.
> --
>
> Key: HBASE-13076
> URL: https://issues.apache.org/jira/browse/HBASE-13076
> Project: HBase
>  Issue Type: Bug
>  Components: master, Region Assignment
>Affects Versions: 2.0.0
>Reporter: Andrey Stepachev
>Assignee: Andrey Stepachev
> Attachments: 23757f039d83f4f17ca18815eae70b28.log, HBASE-13076.patch
>
>
> Got situation where region can be opened while table is disabling by 
> DisableTableHandler. Here is relevant log for such situation. There is no 
> clues who issued OPEN to region.
> Log file attached.
> UPD: A bit more details. It seems that even in case of new state put into 
> meta, it still possible to get previous state.
> That leads to one more round of assignment invoked in 
> AssignmentManager#onRegionClosed.
> UPD: Table become ENABLED, thats leads to regions instructed to assign 
> immediately on onRegionClosed. BulkDisabler will not know about that and will 
> wait indefinitely, because it will not issue unassign for newly opened 
> regions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12958) SSH doing hbase:meta get but hbase:meta not assigned

2015-02-04 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306180#comment-14306180
 ] 

Jimmy Xiang commented on HBASE-12958:
-

+1, good fix. Just one nit, the change to MetaTableAccessor.java, the null 
check should be at the beginning of the method.


> SSH doing hbase:meta get but hbase:meta not assigned
> 
>
> Key: HBASE-12958
> URL: https://issues.apache.org/jira/browse/HBASE-12958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: stack
>Assignee: stack
> Fix For: 1.0.0, 2.0.0, 1.1.0, 0.98.11
>
> Attachments: 12958.txt
>
>
> All master threads are blocked waiting on this call to return:
> {code}
> "MASTER_SERVER_OPERATIONS-c2020:16020-2" #189 prio=5 os_prio=0 
> tid=0x7f4b0408b000 nid=0x7821 in Object.wait() [0x7f4ada24d000]
>java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:168)
> - locked <0x00041c374f50> (a 
> java.util.concurrent.atomic.AtomicBoolean)
> at org.apache.hadoop.hbase.client.HTable.get(HTable.java:881)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.get(MetaTableAccessor.java:208)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getRegionLocation(MetaTableAccessor.java:250)
> at 
> org.apache.hadoop.hbase.MetaTableAccessor.getRegion(MetaTableAccessor.java:225)
> at 
> org.apache.hadoop.hbase.master.RegionStates.serverOffline(RegionStates.java:634)
> - locked <0x00041c1f0d80> (a 
> org.apache.hadoop.hbase.master.RegionStates)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.processServerShutdown(AssignmentManager.java:3298)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:226)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Master is stuck trying to find hbase:meta on the server that just crashed and 
> that we just recovered:
> Mon Feb 02 23:00:02 PST 2015, null, java.net.SocketTimeoutException: 
> callTimeout=6, callDuration=68181: row '' on table 'hbase:meta' at 
> region=hbase:meta,,1.1588230740, 
> hostname=c2022.halxg.cloudera.com,16020,1422944918568, seqNum=0
> Will add more detail in a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12034) If I kill single RS in branch-1, all regions end up on Master!

2015-01-27 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293813#comment-14293813
 ] 

Jimmy Xiang commented on HBASE-12034:
-

Thanks a lot for pointed that out. I updated the release notes for HBASE-10923 
a little about the "none" value. It is not reliable/feasible to use a space 
char. As to upper case, I saw similar usage in hbase-default.xml, for example, 
hbase.regionserver.regionSplitLimit, hbase.zookeeper.property.maxClientCnxns, 
etc. We have mixed usage. I am open to either way. One of the reason that it is 
not documented in hbase-default.xml is that this is not turned on by default in 
branch 1, and probably users should not touch it?

> If I kill single RS in branch-1, all regions end up on Master!
> --
>
> Key: HBASE-12034
> URL: https://issues.apache.org/jira/browse/HBASE-12034
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: stack
>Assignee: Jimmy Xiang
>Priority: Critical
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12034_1.patch, hbase-12034_2.patch
>
>
> This is unexpected.  M should not be carrying regions in branch-1.  Right 
> [~jxiang]?   Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-10923) Control where to put meta region

2015-01-27 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-10923:

Release Note: This patch introduced a new configuration 
"hbase.balancer.tablesOnMaster" to control what tables' regions should be put 
on the master by a load balancer. By default, we will put regions of table acl, 
namespace, and meta on master, i.e. the default configuration is the same as 
"hbase:acl,hbase:namespace,hbase:meta". To put no region on the master, you 
need to set "hbase.balancer.tablesOnMaster" to "none" instead of an empty 
string(the default will be used if it is empty).  (was: This patch introduced a 
new configuration "hbase.balancer.tablesOnMaster" to control what tables' 
regions should be put on the master by a load balancer. By default, we will put 
regions of table acl, namespace, and meta on master, i.e. the default 
configuration is the same as "hbase:acl,hbase:namespace,hbase:meta". To put no 
region on the master, you need to set "hbase.balancer.tablesOnMaster" to " " 
instead of an empty string(the default will be used if it is empty).)

> Control where to put meta region
> 
>
> Key: HBASE-10923
> URL: https://issues.apache.org/jira/browse/HBASE-10923
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.99.0
>
> Attachments: hbase-10923.patch
>
>
> There is a concern on placing meta regions on the master, as in the comments 
> of HBASE-10569. I was thinking we should have a configuration for a load 
> balancer to decide where to put it.  Adjusting this configuration we can 
> control whether to put the meta on master, or other region server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12880) RegionState in state SPLIT doesn't removed from region states

2015-01-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284462#comment-14284462
 ] 

Jimmy Xiang commented on HBASE-12880:
-

[~octo47], these states in the map are not removed. You have too many regions 
in such states?  If so, I think it is fine to remove them after 30 minutes ~ 
couple hours.

> RegionState in state SPLIT doesn't removed from region states
> -
>
> Key: HBASE-12880
> URL: https://issues.apache.org/jira/browse/HBASE-12880
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 2.0.0, 1.1.0
>Reporter: Andrey Stepachev
>Assignee: Andrey Stepachev
> Attachments: HBASE-12880.patch, master-with-split-regions-2-1.jpg
>
>
> During my work on patch HBASE-7332 I stumbled on strange behaviour in 
> RegionStates. Split region doesn't removed from regionStates in 
> regionOffline() method and RegionState for this region sits in regionStates 
> map indefinitely long (until RS rebooted).
> (that is clearly seen in HBASE-7332 by simple creating table and splitting it 
> from command line).
> Is that was intended to be so and some chore eventually will remove it from 
> regionStates (didn't find with fast code scanning) or here can be resource 
> leak?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12640) Add Thrift-over-HTTPS and doAs support for Thrift Server

2014-12-17 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250928#comment-14250928
 ] 

Jimmy Xiang commented on HBASE-12640:
-

Done. Thanks.

> Add Thrift-over-HTTPS and doAs support for Thrift Server
> 
>
> Key: HBASE-12640
> URL: https://issues.apache.org/jira/browse/HBASE-12640
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 1.0.0, 2.0.0
>
> Attachments: HBASE-12640_addendum.patch, HBASE-12640_v1.patch, 
> HBASE-12640_v2.patch, HBASE-12640_v3.patch
>
>
> In HBASE-11349, impersonation support has been added to Thrift Server. But 
> the limitation is thrift client must use same set of credentials throughout 
> the session. These changes will help us in circumventing this problem, by 
> allowing user to populate doAs parameter as per his needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12640) Add Thrift-over-HTTPS and doAs support for Thrift Server

2014-12-17 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12640:

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks Srikanth for the patch. Integrated into branch 1 and master.

> Add Thrift-over-HTTPS and doAs support for Thrift Server
> 
>
> Key: HBASE-12640
> URL: https://issues.apache.org/jira/browse/HBASE-12640
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Fix For: 1.0.0, 2.0.0
>
> Attachments: HBASE-12640_v1.patch, HBASE-12640_v2.patch, 
> HBASE-12640_v3.patch
>
>
> In HBASE-11349, impersonation support has been added to Thrift Server. But 
> the limitation is thrift client must use same set of credentials throughout 
> the session. These changes will help us in circumventing this problem, by 
> allowing user to populate doAs parameter as per his needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12704) Add demo client which uses doAs functionality on Thrift-over-HTTPS.

2014-12-17 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12704:

   Resolution: Fixed
Fix Version/s: 2.0.0
   1.0.0
   Status: Resolved  (was: Patch Available)

Thanks Srikanth for the patch. Integrated into branch 1 and master.

> Add demo client which uses doAs functionality on Thrift-over-HTTPS.
> ---
>
> Key: HBASE-12704
> URL: https://issues.apache.org/jira/browse/HBASE-12704
> Project: HBase
>  Issue Type: Sub-task
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
> Fix For: 1.0.0, 2.0.0
>
> Attachments: HBASE-12704.patch
>
>
> As per the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12704) Add demo client which uses doAs functionality on Thrift-over-HTTPS.

2014-12-17 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250603#comment-14250603
 ] 

Jimmy Xiang commented on HBASE-12704:
-

+1

> Add demo client which uses doAs functionality on Thrift-over-HTTPS.
> ---
>
> Key: HBASE-12704
> URL: https://issues.apache.org/jira/browse/HBASE-12704
> Project: HBase
>  Issue Type: Sub-task
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-12704.patch
>
>
> As per the description.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12640) Add Thrift-over-HTTPS and doAs support for Thrift Server

2014-12-17 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250584#comment-14250584
 ] 

Jimmy Xiang commented on HBASE-12640:
-

+1. Looks good to me. Just some nits: authFilter is not used/needed.

> Add Thrift-over-HTTPS and doAs support for Thrift Server
> 
>
> Key: HBASE-12640
> URL: https://issues.apache.org/jira/browse/HBASE-12640
> Project: HBase
>  Issue Type: Improvement
>  Components: Thrift
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
> Attachments: HBASE-12640_v1.patch, HBASE-12640_v2.patch
>
>
> In HBASE-11349, impersonation support has been added to Thrift Server. But 
> the limitation is thrift client must use same set of credentials throughout 
> the session. These changes will help us in circumventing this problem, by 
> allowing user to populate doAs parameter as per his needs. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12572) Meta flush hangs

2014-11-25 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224872#comment-14224872
 ] 

Jimmy Xiang commented on HBASE-12572:
-

Probably you won't be able to find this commit. It's my local commit to revert 
surefire to 2.17 (just simple one line pom.xml change). The parent shra is 
b1f7d7cd32d4c1ea1b9207472dfab6ca257aa800 (HBASE-12448).

> Meta flush hangs
> 
>
> Key: HBASE-12572
> URL: https://issues.apache.org/jira/browse/HBASE-12572
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Jimmy Xiang
> Attachments: master.jstack, meta-flushing.png
>
>
> Not sure if this is still an issue with the latest branch 1 code. I ran into 
> this with branch 1 commit: 0.99.2-SNAPSHOT, 
> revision=290749fc56d07461441bd532f62d70f562eee588.
> Jstack shows lots of scanners blocked at close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12572) Meta flush hangs

2014-11-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12572:

Attachment: meta-flushing.png

> Meta flush hangs
> 
>
> Key: HBASE-12572
> URL: https://issues.apache.org/jira/browse/HBASE-12572
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Jimmy Xiang
> Attachments: master.jstack, meta-flushing.png
>
>
> Not sure if this is still an issue with the latest branch 1 code. I ran into 
> this with branch 1 commit: 0.99.2-SNAPSHOT, 
> revision=290749fc56d07461441bd532f62d70f562eee588.
> Jstack shows lots of scanners blocked at close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12572) Meta flush hangs

2014-11-25 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12572:

Attachment: master.jstack

> Meta flush hangs
> 
>
> Key: HBASE-12572
> URL: https://issues.apache.org/jira/browse/HBASE-12572
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Jimmy Xiang
> Attachments: master.jstack
>
>
> Not sure if this is still an issue with the latest branch 1 code. I ran into 
> this with branch 1 commit: 0.99.2-SNAPSHOT, 
> revision=290749fc56d07461441bd532f62d70f562eee588.
> Jstack shows lots of scanners blocked at close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12572) Meta flush hangs

2014-11-25 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12572:
---

 Summary: Meta flush hangs
 Key: HBASE-12572
 URL: https://issues.apache.org/jira/browse/HBASE-12572
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Jimmy Xiang


Not sure if this is still an issue with the latest branch 1 code. I ran into 
this with branch 1 commit: 0.99.2-SNAPSHOT, 
revision=290749fc56d07461441bd532f62d70f562eee588.

Jstack shows lots of scanners blocked at close.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12555) Region mover should not try to move regions to master

2014-11-21 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12555:
---

 Summary: Region mover should not try to move regions to master
 Key: HBASE-12555
 URL: https://issues.apache.org/jira/browse/HBASE-12555
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang


If meta and master is co-located, master is a region server. Region mover 
script may try to move regions to the master which will fail since load 
balancer doesn't allow that. The script should be fixed not to move regions to 
master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12464) meta table region assignment stuck in the FAILED_OPEN state due to region server not fully ready to serve

2014-11-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219591#comment-14219591
 ] 

Jimmy Xiang commented on HBASE-12464:
-

The patch for 2.0 looks good to me. Thanks.

> meta table region assignment stuck in the FAILED_OPEN state due to region 
> server not fully ready to serve
> -
>
> Key: HBASE-12464
> URL: https://issues.apache.org/jira/browse/HBASE-12464
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.0.0, 2.0.0, 0.99.1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 2.0.0
>
> Attachments: HBASE-12464.v1-1.0.patch, HBASE-12464.v1-2.0.patch, 
> HBASE-12464.v2-2.0.patch
>
>   Original Estimate: 24h
>  Time Spent: 7.4h
>  Remaining Estimate: 1h
>
> meta table region assignment could reach to the 'FAILED_OPEN' state, which 
> makes the region not available unless the target region server shutdown or 
> manual resolution.  This is undesirable state for meta tavle region.
> Here is the sequence how this could happen (the code is in 
> AssignmentManager#assign()):
> Step 1: Master detects a region server (RS1) that hosts one meta table region 
> is down, it changes the meta region state from 'online' to 'offline'
> Step 2: In a loop (with configuable maximumAttempts count, default is 10, and 
> minimal is 1), AssignmentManager tries to find a RS to host the meta table 
> region.  If there is no RS available, it would loop forver by resetting the 
> loop count (BUG#1 from this logic - a small bug) 
> {code}
>if (region.isMetaRegion()) {
>   try {
> Thread.sleep(this.sleepTimeBeforeRetryingMetaAssignment);
> if (i == maximumAttempts) i = 1; // ==> BUG: if 
> maximumAttempts is 1, then the loop will end.
> continue;
>   } catch (InterruptedException e) {
>   ...
>}
> {code}
> Step 3: Once a new RS is found (RS2), inside the same loop as Step 2, 
> AssignmentManager tries to assign the meta region to RS2 (OFFLINE, RS1 => 
> PENDING_OPEN, RS2).  If for some reason that opening the region in RS2 failed 
> (eg. the target RS2 is not ready to serve - ServerNotRunningYetException), 
> AssignmentManager would change the state from (PENDING_OPEN, RS2) to 
> (FAILED_OPEN, RS2).  then it would retry (and even change the RS server to go 
> to).  The retry is up to maximumAttempts.  Once the maximumAttempts is 
> reached, the meta region will be in the 'FAILED_OPEN' state, unless either 
> (1).  RS2 shutdown to trigger region assignment again or (2). it is 
> reassigned by an operator via HBase Shell.  
> Based on the document ( http://hbase.apache.org/book/regions.arch.html ), 
> this is by design - "17. For regions in FAILED_OPEN or FAILED_CLOSE states , 
> the master tries to close them again when they are reassigned by an operator 
> via HBase Shell.".  
> However, this is bad design, espcially for meta table region (it is arguable 
> that the design is good for regular table - for this ticket, I am more focus 
> on fixing the meta region availablity issue).  
> I propose 2 possible fixes:
> Fix#1 (band-aid change): in Step 3, just like Step 2, if the region is a meta 
> table region, reset the loop count so that it would not leave the loop with 
> meta table region in FAILED_OPEN state.
> Fix#2 (more involved): if a region is in FAILED_OPEN state, we should provide 
> a way to automatically trigger AssignmentManager::assign() after a short 
> period of time (leaving any region in FAILED_OPEN state or other states like 
> 'FAILED_CLOSE' is undesirable, should have some way to retrying and auto-heal 
> the region).
> I think at least for 1.0.0, Fix#1 is good enough.  We can open a task-type of 
> JIRA for Fix#2 in future release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12479) Backport HBASE-11689 (Track meta in transition) to 0.98 and branch-1

2014-11-18 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217219#comment-14217219
 ] 

Jimmy Xiang commented on HBASE-12479:
-

My bad. I missed that part.

> Backport HBASE-11689 (Track meta in transition) to 0.98 and branch-1
> 
>
> Key: HBASE-12479
> URL: https://issues.apache.org/jira/browse/HBASE-12479
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 0.98.9, 0.99.2
>
> Attachments: HBASE-12479-0.98.patch
>
>
> Required for zk-less assignment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-18 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217212#comment-14217212
 ] 

Jimmy Xiang commented on HBASE-12480:
-

You meant testOpenFailed? Yes, it may be null.  I see. It's better to handle 
it. Thanks.

> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12479) Backport HBASE-11689 (Track meta in transition) to 0.98 and branch-1

2014-11-18 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217187#comment-14217187
 ] 

Jimmy Xiang commented on HBASE-12479:
-

It is used for ZK-less region assignment and non-colated meta and master. When 
we commited HBASE-11689, we didn't do a patch for 1.0/0.98  because it's hard 
to make the master right for these branches for both ZK-based and ZK-less 
region assignments.  The attached patch didn't touch HMaster at all. It seems 
the patch is not complete.

> Backport HBASE-11689 (Track meta in transition) to 0.98 and branch-1
> 
>
> Key: HBASE-12479
> URL: https://issues.apache.org/jira/browse/HBASE-12479
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 0.98.9, 0.99.2
>
> Attachments: HBASE-12479-0.98.patch
>
>
> Required for zk-less assignment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12464) meta table region assignment stuck in the FAILED_OPEN state due to region server not fully ready to serve

2014-11-18 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216603#comment-14216603
 ] 

Jimmy Xiang commented on HBASE-12464:
-

It's not good for meta to stuck in FAILED_OPEN. Agree we should handle it 
differently. The patch looks good. Just couples things:

1. Can we add a log (info/debug level may be fine) when we reset the retry 
count to 0?
2. We also need to prevent meta region goes to FAILED_OPEN at method 
AssignmentManger#onRegionFailedOpen.

How about FAILED_CLOSE? It should be fine since the meta region is still 
available?

bq. Fix#2 (more involved): if a region is in FAILED_OPEN state, we should 
provide a way to automatically trigger AssignmentManager::assign() after a 
short period of time (leaving any region in FAILED_OPEN state or other states 
like 'FAILED_CLOSE' is undesirable, should have some way to retrying and 
auto-heal the region).
Is this essentially the same as setting maximumAttempts to a huge number? In 
many cases, a region may not be able to heal automatically without a pill. 
Personally, I think a better monitoring system could be better in this case.

> meta table region assignment stuck in the FAILED_OPEN state due to region 
> server not fully ready to serve
> -
>
> Key: HBASE-12464
> URL: https://issues.apache.org/jira/browse/HBASE-12464
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 1.0.0, 2.0.0, 0.99.1
>Reporter: Stephen Yuan Jiang
>Assignee: Stephen Yuan Jiang
> Fix For: 1.0.0, 2.0.0, 0.99.2
>
> Attachments: HBASE-12464.v1-2.0.patch
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> meta table region assignment could reach to the 'FAILED_OPEN' state, which 
> makes the region not available unless the target region server shutdown or 
> manual resolution.  This is undesirable state for meta tavle region.
> Here is the sequence how this could happen (the code is in 
> AssignmentManager::assign()):
> Step 1: Master detects a region server (RS1) that hosts one meta table region 
> is down, it changes the meta region state from 'online' to 'offline'
> Step 2: In a loop (with configuable maximumAttempts count, default is 10, and 
> minimal is 1), AssignmentManager tries to find a RS to host the meta table 
> region.  If there is no RS available, it would loop forver by resetting the 
> loop count (!!BUG#1 from this logic - a small bug!!) 
>if (region.isMetaRegion()) {
> -try {
> -  Thread.sleep(this.sleepTimeBeforeRetryingMetaAssignment);
> -  if (i == maximumAttempts) i = 1; // ==> BUG: if 
> maximumAttempts is 1, then the loop will end.
> -  continue;
> -} catch (InterruptedException e) {
> -  ...
> -}
> Step 3: Once a new RS is found (RS2), inside the same loop as Step 2, 
> AssignmentManager tries to assign the meta region to RS2 (OFFLINE, RS1 => 
> PENDING_OPEN, RS2).  If for some reason that opening the region in RS2 failed 
> (eg. the target RS2 is not ready to serve - ServerNotRunningYetException), 
> AssignmentManager would change the state from (PENDING_OPEN, RS2) to 
> (FAILED_OPEN, RS2).  then it would retry (and even change the RS server to go 
> to).  The retry is up to maximumAttempts.  Once the maximumAttempts is 
> reached, the meta region will be in the 'FAILED_OPEN' state, unless either 
> (1).  RS2 shutdown to trigger region assignment again or (2). it is 
> reassigned by an operator via HBase Shell.  
> Based on the document ( http://hbase.apache.org/book/regions.arch.html ), 
> this is by design - "17. For regions in FAILED_OPEN or FAILED_CLOSE states , 
> the master tries to close them again when they are reassigned by an operator 
> via HBase Shell.".  
> However, this is bad design, espcially for meta table region (it is arguable 
> that the design is good for regular table - for this ticket, I am more focus 
> on fixing the meta region availablity issue).  
> I propose 2 possible fixes:
> Fix#1 (band-aid change): in Step 3, just like Step 2, if the region is a meta 
> table region, reset the loop count so that it would not leave the loop with 
> meta table region in FAILED_OPEN state.
> Fix#2 (more involved): if a region is in FAILED_OPEN state, we should provide 
> a way to automatically trigger AssignmentManager::assign() after a short 
> period of time (leaving any region in FAILED_OPEN state or other states like 
> 'FAILED_CLOSE' is undesirable, should have some way to retrying and auto-heal 
> the region).
> I think at least for 1.0.0, Fix#1 is good enough.  We can open a task-type of 
> JIRA for Fix#2 in future release.



--
This message was sent by Atlassian JIRA
(v6.3

[jira] [Commented] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-17 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14215430#comment-14215430
 ] 

Jimmy Xiang commented on HBASE-12480:
-

bq. Passing null to ConcurrentMap.keySet().contains() will throw NPE
Right. OK.

bq. Hmm, will make the change. My initial thinking was that we need to make 
blocking calls but that doesn't seem to matter.
May not use invokeUnAssign directly.  It's better to do it asynchronously.

> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213334#comment-14213334
 ] 

Jimmy Xiang commented on HBASE-12480:
-

I see. 

{noformat}
 +&& serverName != null && onlineServers.contains(serverName)) {
{noformat}
No need this change. If serverName is null, onlineServers should not contain it.

bq. isServerOnline(ServerName) will return false when serverName is null (It 
will be null in case 2 above)
In master branch, the server should be never null if it is in these states.

{noformat}
+  case FAILED_CLOSE:
+  case FAILED_OPEN:
+unassign(regionInfo, regionState.getServerName(), null);
...
{noformat}
Should use invokeUnAssign.


> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213294#comment-14213294
 ] 

Jimmy Xiang commented on HBASE-12480:
-

Before regions get into such states, we have tried many times already. If the 
region server dies, SSH will retry in case things are changed. If the region 
server stays up, there may be no need to retry at all. If admin fixes the 
problem causing failed open/close, they can re-assign the region from shell. 
What do you think?

BTW, no need to change 
serverManager.isServerOnline(regionState.getServerName()) I think, it should do 
exactly what you want.

> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213291#comment-14213291
 ] 

Jimmy Xiang edited comment on HBASE-12480 at 11/15/14 2:30 AM:
---

Are you sure this is an issue for 2.0.0? I remember I tried to file a similar 
jira before and didn't becase it is not an issue after I looked into it (for 
master branch, not other branches).


was (Author: jxiang):
Are you sure this is an issue? I remember I tried to file a similar jira before 
and didn't becase it is not an issue after I looked into it (for master branch, 
not other branches).

> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12480) Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover

2014-11-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14213291#comment-14213291
 ] 

Jimmy Xiang commented on HBASE-12480:
-

Are you sure this is an issue? I remember I tried to file a similar jira before 
and didn't becase it is not an issue after I looked into it (for master branch, 
not other branches).

> Regions in FAILED_OPEN/FAILED_CLOSE should be processed on master failover 
> ---
>
> Key: HBASE-12480
> URL: https://issues.apache.org/jira/browse/HBASE-12480
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Reporter: Virag Kothari
>Assignee: Virag Kothari
> Fix For: 2.0.0, 0.98.9, 0.99.2
>
> Attachments: HBASE-12480.patch
>
>
> For zk assignment, we used to process this regions. For zk less assignment, 
> we should do the same



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12453) Make region available once it's open

2014-11-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206644#comment-14206644
 ] 

Jimmy Xiang commented on HBASE-12453:
-

We used to update znode, then update meta.  In this case, it is something like 
"update znode" vs "notify master".  This should not be an issue.

The issue is that if master is down, we can't notify master now. That's what I 
was thinking about.


> Make region available once it's open
> 
>
> Key: HBASE-12453
> URL: https://issues.apache.org/jira/browse/HBASE-12453
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> Currently (in trunk, with zk-less assignment), a region is available to 
> serving requests only after RS notifies the master the region is open, and 
> the meta is updated with the new location. We may be able to do better than 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12453) Make region available once it's open

2014-11-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206621#comment-14206621
 ] 

Jimmy Xiang commented on HBASE-12453:
-

I looked into it and found it will introduce quite some racing issues.

> Make region available once it's open
> 
>
> Key: HBASE-12453
> URL: https://issues.apache.org/jira/browse/HBASE-12453
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> Currently (in trunk, with zk-less assignment), a region is available to 
> serving requests only after RS notifies the master the region is open, and 
> the meta is updated with the new location. We may be able to do better than 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-12453) Make region available once it's open

2014-11-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-12453.
-
Resolution: Invalid

> Make region available once it's open
> 
>
> Key: HBASE-12453
> URL: https://issues.apache.org/jira/browse/HBASE-12453
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> Currently (in trunk, with zk-less assignment), a region is available to 
> serving requests only after RS notifies the master the region is open, and 
> the meta is updated with the new location. We may be able to do better than 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12453) Make region available once it's open

2014-11-10 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12453:
---

 Summary: Make region available once it's open
 Key: HBASE-12453
 URL: https://issues.apache.org/jira/browse/HBASE-12453
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Currently (in trunk, with zk-less assignment), a region is available to serving 
requests only after RS notifies the master the region is open, and the meta is 
updated with the new location. We may be able to do better than this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12398) Region isn't assigned in an extreme race condition

2014-11-01 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193657#comment-14193657
 ] 

Jimmy Xiang commented on HBASE-12398:
-

The master branch should not have such a problem because only master updates 
the region states (step b won't happen). So I think we don't need a patch for 
master.

> Region isn't assigned in an extreme race condition
> --
>
> Key: HBASE-12398
> URL: https://issues.apache.org/jira/browse/HBASE-12398
> Project: HBase
>  Issue Type: Bug
>  Components: Region Assignment
>Affects Versions: 0.98.7
>Reporter: Jeffrey Zhong
>Assignee: Jeffrey Zhong
> Attachments: HBASE-12398.patch
>
>
> In a test, [~enis] has seen a condition which made one of the regions 
> unassigned. 
> The client failed since the region is not online anywhere: 
> {code}
> 2014-10-29 01:51:40,731 WARN  [HBaseReaderThread_13] 
> util.MultiThreadedReader: 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=35, exceptions:
> Wed Oct 29 01:39:51 UTC 2014, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@cc21330, 
> org.apache.hadoop.hbase.NotServingRegionException: 
> org.apache.hadoop.hbase.NotServingRegionException: Region 
> IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9.
>  is not online on hor8n08.gq1.ygridcore.net,60020,1414546670414
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2774)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4257)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2906)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29990)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2078)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:722)
> {code}
> The root cause of the issue is due to some extreme race condition:
> a) a region is about to open and receives a closeRpc request triggered by a 
> second re-assignment
> b) the second re-assignment updates region state to offline while immediately 
> is overwritten to OPEN from previous region open ZK opened notification
> c) when the region reopened on the same RS by the second assignment, AM force 
> the region to close as the its region state isn't in PendingOpenOrOpening 
> state.  
> d) the region ends up offline & can't server any request
> Region Server Side:
> 1) A region almost opens region 689b77e1bad7e951b0d9ef4663b217e9 while the 
> RS(hor8n10) receives a closeRegion request.
> {noformat}
> 2014-10-29 01:39:43,153 INFO  
> [PriorityRpcServer.handler=2,queue=0,port=60020] regionserver.HRegionServer: 
> Received CLOSE for the region:689b77e1bad7e951b0d9ef4663b217e9 , which we are 
> already trying to OPEN. Cancelling OPENING.
> {noformat}
> 2) Since region 689b77e1bad7e951b0d9ef4663b217e9 was already opened right 
> before some final steps, so the RS logs the following message and close 
> 689b77e1bad7e951b0d9ef4663b217e9 immediately after the RS update ZK node 
> state to 'OPENED'.
> {noformat}
> 2014-10-29 01:39:43,198 ERROR [RS_OPEN_REGION-hor8n10:60020-0] 
> handler.OpenRegionHandler: Race condition: we've finished to open a region, 
> while a close was requested  on 
> region=IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9..
>  It can be a critical error, as a region that should be closed is now opened. 
> Closing it now
> {noformat}
> In Master Server Side:
> {noformat}
> 2014-10-29 01:39:43,177 DEBUG [AM.ZK.Worker-pool2-t55] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=hor8n10.gq1.ygridcore.net,60020,1414546531945, 
> region=689b77e1bad7e951b0d9ef4663b217e9, 
> current_state={689b77e1bad7e951b0d9ef4663b217e9 state=OPENING, 
> ts=1414546783152, server=hor8n10.gq1.ygridcore.net,60020,1414546531945}
> 
> 2014-10-29 01:39:43,255 DEBUG [AM.-pool1-t16] master.AssignmentManager: 
> Offline 
> IntegrationTestRegionReplicaReplication,0666,1414545619766_0001.689b77e1bad7e951b0d9ef4663b217e9.,
>  it's not any more on hor8n10.gq1.ygridcore.net,60020,1414546531945
> 
> 2014-10-29 01:39:43,942 DEBUG [AM.ZK.Worker-pool2-t58] 
> master.AssignmentManager: Handling RS_ZK_REGION_OPENED, 
> server=hor8n10.gq1.ygridcore.net,60020,1414546531945, 
> region=689b77e1bad7e951b0d9ef4663b217e9, 
> current_state={689b77e1bad7e951b0d9ef4663b

[jira] [Updated] (HBASE-12380) TestRegionServerNoMaster#testMultipleOpen is flaky after HBASE-11760

2014-10-30 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12380:

   Resolution: Fixed
Fix Version/s: 2.0.0
 Assignee: Esteban Gutierrez
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into branch master. Thanks Esteban for the patch.

> TestRegionServerNoMaster#testMultipleOpen is flaky after HBASE-11760
> 
>
> Key: HBASE-12380
> URL: https://issues.apache.org/jira/browse/HBASE-12380
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Assignee: Esteban Gutierrez
> Fix For: 2.0.0
>
> Attachments: HBASE-12380.v0.patch
>
>
> Noticed this while trying to fix faulty test while working on a fix for 
> HBASE-12219:
> {code}
> Tests in error:
>   TestRegionServerNoMaster.testMultipleOpen:237 » Service 
> java.io.IOException: R...
>   TestRegionServerNoMaster.testCloseByRegionServer:211->closeRegionNoZK:201 » 
> Service
> {code}
> Initially I thought the problem was on my patch for HBASE-12219 but I noticed 
> that the issue was occurring on the 7th attempt to open the region. However I 
> was able to reproduce the same problem in the master branch after increasing 
> the number of requests in testMultipleOpen():
> {code}
> 2014-10-29 15:03:45,043 INFO  [Thread-216] regionserver.RSRpcServices(1334): 
> Receiving OPEN for the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which we are already trying to OPEN - ignoring this new request for this 
> region.
> Submitting openRegion attempt: 16 <
> 2014-10-29 15:03:45,044 INFO  [Thread-216] regionserver.RSRpcServices(1311): 
> Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> 2014-10-29 15:03:45,044 INFO  
> [PostOpenDeployTasks:025198143197ea68803e49819eae27ca] 
> hbase.MetaTableAccessor(1307): Updated row 
> TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca. 
> with server=192.168.1.105,63082,1414620220789
> Submitting openRegion attempt: 17 <
> 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2] 
> handler.OpenRegionHandler(88): Region 025198143197ea68803e49819eae27ca was 
> already online when we started processing the opening. Marking this new 
> attempt as failed
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931): 
> ABORTING region server 192.168.1.105,63082,1414620220789: Received OPEN for 
> the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which is already online
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937): 
> RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2014-10-29 15:03:45,054 WARN  [Thread-216] regionserver.HRegionServer(1955): 
> Unable to report fatal error to master
> com.google.protobuf.ServiceException: java.io.IOException: Call to 
> /192.168.1.105:63079 failed on local exception: java.io.IOException: 
> Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2
> at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707)
> at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1308)
> at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerNoMaster.testMultiple

[jira] [Commented] (HBASE-12380) TestRegionServerNoMaster#testMultipleOpen is flaky after HBASE-11760

2014-10-30 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190479#comment-14190479
 ] 

Jimmy Xiang commented on HBASE-12380:
-

+1

> TestRegionServerNoMaster#testMultipleOpen is flaky after HBASE-11760
> 
>
> Key: HBASE-12380
> URL: https://issues.apache.org/jira/browse/HBASE-12380
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
> Attachments: HBASE-12380.v0.patch
>
>
> Noticed this while trying to fix faulty test while working on a fix for 
> HBASE-12219:
> {code}
> Tests in error:
>   TestRegionServerNoMaster.testMultipleOpen:237 » Service 
> java.io.IOException: R...
>   TestRegionServerNoMaster.testCloseByRegionServer:211->closeRegionNoZK:201 » 
> Service
> {code}
> Initially I thought the problem was on my patch for HBASE-12219 but I noticed 
> that the issue was occurring on the 7th attempt to open the region. However I 
> was able to reproduce the same problem in the master branch after increasing 
> the number of requests in testMultipleOpen():
> {code}
> 2014-10-29 15:03:45,043 INFO  [Thread-216] regionserver.RSRpcServices(1334): 
> Receiving OPEN for the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which we are already trying to OPEN - ignoring this new request for this 
> region.
> Submitting openRegion attempt: 16 <
> 2014-10-29 15:03:45,044 INFO  [Thread-216] regionserver.RSRpcServices(1311): 
> Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> 2014-10-29 15:03:45,044 INFO  
> [PostOpenDeployTasks:025198143197ea68803e49819eae27ca] 
> hbase.MetaTableAccessor(1307): Updated row 
> TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca. 
> with server=192.168.1.105,63082,1414620220789
> Submitting openRegion attempt: 17 <
> 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2] 
> handler.OpenRegionHandler(88): Region 025198143197ea68803e49819eae27ca was 
> already online when we started processing the opening. Marking this new 
> attempt as failed
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931): 
> ABORTING region server 192.168.1.105,63082,1414620220789: Received OPEN for 
> the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which is already online
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937): 
> RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2014-10-29 15:03:45,054 WARN  [Thread-216] regionserver.HRegionServer(1955): 
> Unable to report fatal error to master
> com.google.protobuf.ServiceException: java.io.IOException: Call to 
> /192.168.1.105:63079 failed on local exception: java.io.IOException: 
> Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2
> at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707)
> at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.openRegion(RSRpcServices.java:1308)
> at 
> org.apache.hadoop.hbase.regionserver.TestRegionServerNoMaster.testMultipleOpen(TestRegionServerNoMaster.java:237)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.Deleg

[jira] [Commented] (HBASE-12380) Too many attempts to open a region can crash the RegionServer

2014-10-30 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190360#comment-14190360
 ] 

Jimmy Xiang commented on HBASE-12380:
-

I have discussed it with Esteban. We agree that it is better not to abort. We 
can log a warning/error message instead and let it go.

The reason for aborting is that this scenario should never happen natually. 
Master has a state machine and won't send the open call again if it is already 
opened.
My concern with not aborting is that we may hide some serious bug in master if 
that indeed happens.

This test is an old test. My suggestion is to remove this test.

> Too many attempts to open a region can crash the RegionServer
> -
>
> Key: HBASE-12380
> URL: https://issues.apache.org/jira/browse/HBASE-12380
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Esteban Gutierrez
>Priority: Critical
>
> Noticed this while trying to fix faulty test while working on a fix for 
> HBASE-12219:
> {code}
> Tests in error:
>   TestRegionServerNoMaster.testMultipleOpen:237 » Service 
> java.io.IOException: R...
>   TestRegionServerNoMaster.testCloseByRegionServer:211->closeRegionNoZK:201 » 
> Service
> {code}
> Initially I thought the problem was on my patch for HBASE-12219 but I noticed 
> that the issue was occurring on the 7th attempt to open the region. However I 
> was able to reproduce the same problem in the master branch after increasing 
> the number of requests in testMultipleOpen():
> {code}
> 2014-10-29 15:03:45,043 INFO  [Thread-216] regionserver.RSRpcServices(1334): 
> Receiving OPEN for the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which we are already trying to OPEN - ignoring this new request for this 
> region.
> Submitting openRegion attempt: 16 <
> 2014-10-29 15:03:45,044 INFO  [Thread-216] regionserver.RSRpcServices(1311): 
> Open TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.
> 2014-10-29 15:03:45,044 INFO  
> [PostOpenDeployTasks:025198143197ea68803e49819eae27ca] 
> hbase.MetaTableAccessor(1307): Updated row 
> TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca. 
> with server=192.168.1.105,63082,1414620220789
> Submitting openRegion attempt: 17 <
> 2014-10-29 15:03:45,046 ERROR [RS_OPEN_REGION-192.168.1.105:63082-2] 
> handler.OpenRegionHandler(88): Region 025198143197ea68803e49819eae27ca was 
> already online when we started processing the opening. Marking this new 
> attempt as failed
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1931): 
> ABORTING region server 192.168.1.105,63082,1414620220789: Received OPEN for 
> the 
> region:TestRegionServerNoMaster,,1414620223682.025198143197ea68803e49819eae27ca.,
>  which is already online
> 2014-10-29 15:03:45,047 FATAL [Thread-216] regionserver.HRegionServer(1937): 
> RegionServer abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2014-10-29 15:03:45,054 WARN  [Thread-216] regionserver.HRegionServer(1955): 
> Unable to report fatal error to master
> com.google.protobuf.ServiceException: java.io.IOException: Call to 
> /192.168.1.105:63079 failed on local exception: java.io.IOException: 
> Connection to /192.168.1.105:63079 is closing. Call id=4, waitTime=2
> at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1707)
> at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1757)
> at 
> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.reportRSFatalError(RegionServerStatusProtos.java:8301)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1952)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abortRegionServer(MiniHBaseCluster.java:174)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$100(MiniHBaseCluster.java:108)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$2.run(MiniHBaseCluster.java:167)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)
> at 
> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:277)
> at 
> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.abort(MiniHBaseCluster.java:165)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1964)
> at 
> org.apache

[jira] [Commented] (HBASE-12319) Inconsistencies during region recovery due to close/open of a region during recovery

2014-10-23 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14181901#comment-14181901
 ] 

Jimmy Xiang commented on HBASE-12319:
-

+1. Looks good to me.

> Inconsistencies during region recovery due to close/open of a region during 
> recovery
> 
>
> Key: HBASE-12319
> URL: https://issues.apache.org/jira/browse/HBASE-12319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.7, 0.99.1
>Reporter: Devaraj Das
>Assignee: Jeffrey Zhong
> Attachments: HBASE-12319.patch
>
>
> In one of my test runs, I saw the following:
> {noformat}
> 2014-10-14 13:45:30,782 DEBUG 
> [StoreOpener-51af4bd23dc32a940ad2dd5435f00e1d-1] regionserver.HStore: loaded 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/test_cf/d6df5cfe15ca41d68c619489fbde4d04,
>  isReference=false, isBulkLoadResult=false, seqid=141197, majorCompaction=true
> 2014-10-14 13:45:30,788 DEBUG [RS_OPEN_REGION-hor9n01:60020-1] 
> regionserver.HRegion: Found 3 recovered edits file(s) under 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d
> .
> .
> 2014-10-14 13:45:31,916 WARN  [RS_OPEN_REGION-hor9n01:60020-1] 
> regionserver.HRegion: Null or non-existent edits file: 
> hdfs://hor9n01.gq1.ygridcore.net:8020/apps/hbase/data/data/default/IntegrationTestIngest/51af4bd23dc32a940ad2dd5435f00e1d/recovered.edits/0198080
> {noformat}
> The above logs is from a regionserver, say RS2. From the initial analysis it 
> seemed like the master asked a certain regionserver to open the region (let's 
> say RS1) and for some reason asked it to close soon after. The open was still 
> proceeding on RS1 but the master reassigned the region to RS2. This also 
> started the recovery but it ended up seeing an inconsistent view of the 
> recovered-edits files (it reports missing files as per the logs above) since 
> the first regionserver (RS1) deleted some files after it completed the 
> recovery. When RS2 really opens the region, it might not see the recent data 
> that was written by flushes on hor9n10 during the recovery process. Reads of 
> that data would have inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12228) Backport HBASE-11373 (hbase-protocol compile failed for name conflict of RegionTransition) to 0.98

2014-10-11 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168264#comment-14168264
 ] 

Jimmy Xiang commented on HBASE-12228:
-

+1. Thanks!

> Backport HBASE-11373 (hbase-protocol compile failed for name conflict of 
> RegionTransition) to 0.98
> --
>
> Key: HBASE-12228
> URL: https://issues.apache.org/jira/browse/HBASE-12228
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Andrew Purtell
> Fix For: 0.98.8
>
> Attachments: HBASE-12228-0.98.patch
>
>
> {quote}
> RegionServerStatus.proto:81:9: "RegionTransition" is already defined in file 
> "ZooKeeper.proto".
> RegionServerStatus.proto:114:12: "RegionTransition" seems to be defined in 
> "ZooKeeper.proto", which is not imported by "RegionServerStatus.proto".  To 
> use it here, please add the necessary import.
> {quote}
> This was introduced into 0.98 in e6ffa86e
> {noformat}
> commit e6ffa86e33ee173afcff15ca4b614e6ec56357ed
> Author: Andrew Purtell 
> Date:   Tue Aug 26 08:01:09 2014 -0700
> HBASE-11546 Backport ZK-less region assignment to 0.98 (Virag Kothari) 
> [1/8]
> 
> HBASE-11059 ZK-less region assignment (Jimmy Xiang
> {noformat}
> There's a later fix for this that needs to be applied:
> {noformat}
> commit 175f133dbc127d7eb2ba5693cc6b2e4fe3c51655
> Author: Jimmy Xiang 
> Date:   Wed Jun 18 08:38:05 2014 -0700
> HBASE-11373 hbase-protocol compile failed for name conflict of 
> RegionTransition
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12230) User impersonation does not work in 'simple' mode.

2014-10-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167699#comment-14167699
 ] 

Jimmy Xiang commented on HBASE-12230:
-

You want to use doAs without authentication?

> User impersonation does not work in 'simple' mode.
> --
>
> Key: HBASE-12230
> URL: https://issues.apache.org/jira/browse/HBASE-12230
> Project: HBase
>  Issue Type: Bug
>  Components: REST, security
>Affects Versions: 0.98.6.1
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
> Attachments: 
> HBASE-12230-User-impersonation-does-not-work-in-simp.patch
>
>
> The [code responsible for initializing proxy 
> configuration|https://github.com/apache/hbase/blob/7cfdb38c9274e306ac37374c147a978c2cef31d6/hbase-server/src/main/java/org/apache/hadoop/hbase/security/HBasePolicyProvider.java#L54]
>  does not execute unless {{"hadoop.security.authorization"}} is set to true. 
> This is departure from other Hadoop components. Impersonation should not be 
> tied to authorization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12216) Lower closed region logging level

2014-10-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12216:

   Resolution: Fixed
Fix Version/s: (was: 0.99.1)
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

The test is ok locally. Integrated into branch master. Thanks.

> Lower closed region logging level
> -
>
> Key: HBASE-12216
> URL: https://issues.apache.org/jira/browse/HBASE-12216
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-12216.patch
>
>
> There are quite some ERROR messages in the log, which sounds some problems 
> but actually they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12216) Lower closed region logging level

2014-10-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165415#comment-14165415
 ] 

Jimmy Xiang commented on HBASE-12216:
-

Now, double closes are due to retry when master restarts, just in case the 
previous close request (before the master crashes) wasn't received by the 
region server yet. It is no longer signs of things going wrong.

> Lower closed region logging level
> -
>
> Key: HBASE-12216
> URL: https://issues.apache.org/jira/browse/HBASE-12216
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12216.patch
>
>
> There are quite some ERROR messages in the log, which sounds some problems 
> but actually they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12216) Lower closed region logging level

2014-10-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12216:

Fix Version/s: 0.99.1
   Status: Patch Available  (was: Open)

> Lower closed region logging level
> -
>
> Key: HBASE-12216
> URL: https://issues.apache.org/jira/browse/HBASE-12216
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12216.patch
>
>
> There are quite some ERROR messages in the log, which sounds some problems 
> but actually they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12216) Lower closed region logging level

2014-10-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12216:

Attachment: hbase-12216.patch

> Lower closed region logging level
> -
>
> Key: HBASE-12216
> URL: https://issues.apache.org/jira/browse/HBASE-12216
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: hbase-12216.patch
>
>
> There are quite some ERROR messages in the log, which sounds some problems 
> but actually they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12216) Lower closed region logging level

2014-10-09 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12216:
---

 Summary: Lower closed region logging level
 Key: HBASE-12216
 URL: https://issues.apache.org/jira/browse/HBASE-12216
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 2.0.0


There are quite some ERROR messages in the log, which sounds some problems but 
actually they are not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12209) NPE in HRegionServer#getLastSequenceId

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12209:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

> NPE in HRegionServer#getLastSequenceId
> --
>
> Key: HBASE-12209
> URL: https://issues.apache.org/jira/browse/HBASE-12209
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12209.patch
>
>
> The region server got the logging splitting task, but the master is gone.
> {noformat}
> 2014-10-08 08:31:22,089 ERROR [RS_LOG_REPLAY_OPS-a2428:20020-1] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getLastSequenceId(HRegionServer.java:2113)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:317)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:218)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:103)
> at 
> org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:72)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12209) NPE in HRegionServer#getLastSequenceId

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12209:

Attachment: hbase-12209.patch

> NPE in HRegionServer#getLastSequenceId
> --
>
> Key: HBASE-12209
> URL: https://issues.apache.org/jira/browse/HBASE-12209
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12209.patch
>
>
> The region server got the logging splitting task, but the master is gone.
> {noformat}
> 2014-10-08 08:31:22,089 ERROR [RS_LOG_REPLAY_OPS-a2428:20020-1] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getLastSequenceId(HRegionServer.java:2113)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:317)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:218)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:103)
> at 
> org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:72)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12209) NPE in HRegionServer#getLastSequenceId

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12209:

Status: Patch Available  (was: Open)

> NPE in HRegionServer#getLastSequenceId
> --
>
> Key: HBASE-12209
> URL: https://issues.apache.org/jira/browse/HBASE-12209
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12209.patch
>
>
> The region server got the logging splitting task, but the master is gone.
> {noformat}
> 2014-10-08 08:31:22,089 ERROR [RS_LOG_REPLAY_OPS-a2428:20020-1] 
> executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getLastSequenceId(HRegionServer.java:2113)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:317)
> at 
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:218)
> at 
> org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:103)
> at 
> org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:72)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12209) NPE in HRegionServer#getLastSequenceId

2014-10-08 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12209:
---

 Summary: NPE in HRegionServer#getLastSequenceId
 Key: HBASE-12209
 URL: https://issues.apache.org/jira/browse/HBASE-12209
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 2.0.0, 0.99.1


The region server got the logging splitting task, but the master is gone.
{noformat}
2014-10-08 08:31:22,089 ERROR [RS_LOG_REPLAY_OPS-a2428:20020-1] 
executor.EventHandler: Caught throwable while processing event RS_LOG_REPLAY
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getLastSequenceId(HRegionServer.java:2113)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:317)
at 
org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:218)
at 
org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:103)
at 
org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:72)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12206) NPE in RSRpcServices

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12206:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

> NPE in RSRpcServices
> 
>
> Key: HBASE-12206
> URL: https://issues.apache.org/jira/browse/HBASE-12206
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12206.patch
>
>
> Looks "leases" is null, which is possible since the region server is not open 
> yet. Will add a check.
> {noformat}
> 2014-10-08 08:38:17,985 ERROR 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> 2014-10-08 08:38:17,988 DEBUG 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: 
> B.defaultRpcServer.handler=0,queue=0,port=20020: callId: 645 service: 
> ClientService methodName: Scan size: 22 connection: 10.20.212.36:53810
> java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2054)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12206) NPE in RSRpcServices

2014-10-08 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163805#comment-14163805
 ] 

Jimmy Xiang commented on HBASE-12206:
-

Sure. Will make it DEBUG. Thanks.

> NPE in RSRpcServices
> 
>
> Key: HBASE-12206
> URL: https://issues.apache.org/jira/browse/HBASE-12206
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12206.patch
>
>
> Looks "leases" is null, which is possible since the region server is not open 
> yet. Will add a check.
> {noformat}
> 2014-10-08 08:38:17,985 ERROR 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> 2014-10-08 08:38:17,988 DEBUG 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: 
> B.defaultRpcServer.handler=0,queue=0,port=20020: callId: 645 service: 
> ClientService methodName: Scan size: 22 connection: 10.20.212.36:53810
> java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2054)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12206) NPE in RSRpcServices

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12206:

Fix Version/s: 0.99.1
   2.0.0
   Status: Patch Available  (was: Open)

> NPE in RSRpcServices
> 
>
> Key: HBASE-12206
> URL: https://issues.apache.org/jira/browse/HBASE-12206
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12206.patch
>
>
> Looks "leases" is null, which is possible since the region server is not open 
> yet. Will add a check.
> {noformat}
> 2014-10-08 08:38:17,985 ERROR 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> 2014-10-08 08:38:17,988 DEBUG 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: 
> B.defaultRpcServer.handler=0,queue=0,port=20020: callId: 645 service: 
> ClientService methodName: Scan size: 22 connection: 10.20.212.36:53810
> java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2054)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12206) NPE in RSRpcServices

2014-10-08 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12206:

Attachment: hbase-12206.patch

> NPE in RSRpcServices
> 
>
> Key: HBASE-12206
> URL: https://issues.apache.org/jira/browse/HBASE-12206
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12206.patch
>
>
> Looks "leases" is null, which is possible since the region server is not open 
> yet. Will add a check.
> {noformat}
> 2014-10-08 08:38:17,985 ERROR 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: Unexpected 
> throwable object
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> 2014-10-08 08:38:17,988 DEBUG 
> [B.defaultRpcServer.handler=0,queue=0,port=20020] ipc.RpcServer: 
> B.defaultRpcServer.handler=0,queue=0,port=20020: callId: 645 service: 
> ClientService methodName: Scan size: 22 connection: 10.20.212.36:53810
> java.io.IOException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2054)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> at java.lang.Thread.run(Thread.java:724)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> ... 4 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12206) NPE in RSRpcServices

2014-10-08 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12206:
---

 Summary: NPE in RSRpcServices
 Key: HBASE-12206
 URL: https://issues.apache.org/jira/browse/HBASE-12206
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Looks "leases" is null, which is possible since the region server is not open 
yet. Will add a check.

{noformat}
2014-10-08 08:38:17,985 ERROR [B.defaultRpcServer.handler=0,queue=0,port=20020] 
ipc.RpcServer: Unexpected throwable object
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:724)
2014-10-08 08:38:17,988 DEBUG [B.defaultRpcServer.handler=0,queue=0,port=20020] 
ipc.RpcServer: B.defaultRpcServer.handler=0,queue=0,port=20020: callId: 645 
service: ClientService methodName: Scan size: 22 connection: 10.20.212.36:53810
java.io.IOException
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2054)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:1957)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30422)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
... 4 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12196:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

> SSH should retry in case failed to assign regions
> -
>
> Key: HBASE-12196
> URL: https://issues.apache.org/jira/browse/HBASE-12196
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12196.patch, hbase-12196_v2.patch
>
>
> If there is only master alive, all regionservers are down, SSH can't find a 
> plan to assign user regions. In this case, SSH should retry.
> {noformat}
> 2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.io.IOException: Unable to determine a plan to assign region(s)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12196:

Attachment: hbase-12196_v2.patch

Attached v2 that added one test to cover this.

> SSH should retry in case failed to assign regions
> -
>
> Key: HBASE-12196
> URL: https://issues.apache.org/jira/browse/HBASE-12196
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12196.patch, hbase-12196_v2.patch
>
>
> If there is only master alive, all regionservers are down, SSH can't find a 
> plan to assign user regions. In this case, SSH should retry.
> {noformat}
> 2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.io.IOException: Unable to determine a plan to assign region(s)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12196:

Status: Patch Available  (was: Open)

> SSH should retry in case failed to assign regions
> -
>
> Key: HBASE-12196
> URL: https://issues.apache.org/jira/browse/HBASE-12196
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12196.patch, hbase-12196_v2.patch
>
>
> If there is only master alive, all regionservers are down, SSH can't find a 
> plan to assign user regions. In this case, SSH should retry.
> {noformat}
> 2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.io.IOException: Unable to determine a plan to assign region(s)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14162704#comment-14162704
 ] 

Jimmy Xiang commented on HBASE-12196:
-

Yes, we can have a test for this one. Let me add one. Thanks.

> SSH should retry in case failed to assign regions
> -
>
> Key: HBASE-12196
> URL: https://issues.apache.org/jira/browse/HBASE-12196
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12196.patch
>
>
> If there is only master alive, all regionservers are down, SSH can't find a 
> plan to assign user regions. In this case, SSH should retry.
> {noformat}
> 2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.io.IOException: Unable to determine a plan to assign region(s)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12196:

Attachment: hbase-12196.patch

> SSH should retry in case failed to assign regions
> -
>
> Key: HBASE-12196
> URL: https://issues.apache.org/jira/browse/HBASE-12196
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12196.patch
>
>
> If there is only master alive, all regionservers are down, SSH can't find a 
> plan to assign user regions. In this case, SSH should retry.
> {noformat}
> 2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.io.IOException: Unable to determine a plan to assign region(s)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12196) SSH should retry in case failed to assign regions

2014-10-07 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12196:
---

 Summary: SSH should retry in case failed to assign regions
 Key: HBASE-12196
 URL: https://issues.apache.org/jira/browse/HBASE-12196
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1


If there is only master alive, all regionservers are down, SSH can't find a 
plan to assign user regions. In this case, SSH should retry.

{noformat}
2014-10-07 14:05:18,310 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.io.IOException: Unable to determine a plan to assign region(s)
at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1411)
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:272)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-11838) Enable PREFIX_TREE in integration tests

2014-10-07 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-11838.
-
   Resolution: Fixed
Fix Version/s: 0.99.1
   2.0.0
 Hadoop Flags: Reviewed

With HBASE-11728 and HBASE-12078, ITBLL with PREFIX_TREE encoding works fine 
for me now. Integrated the patch to branch 1 and master. ITBLL tests all 
supported data encodings from now on. Thanks.

> Enable PREFIX_TREE in integration tests
> ---
>
> Key: HBASE-11838
> URL: https://issues.apache.org/jira/browse/HBASE-11838
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Minor
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-11838.patch
>
>
> HBASE-11728 fixed a PREFIX_TREE encoding bug. Let's try to enable the 
> encoding in integration tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12184) ServerShutdownHandler throws NPE

2014-10-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12184:

   Resolution: Fixed
Fix Version/s: 0.98.7
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Integrated into branch 0.98, 1, and master. Thanks.

> ServerShutdownHandler throws NPE
> 
>
> Key: HBASE-12184
> URL: https://issues.apache.org/jira/browse/HBASE-12184
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.98.7, 0.99.1
>
> Attachments: hbase-12184.patch
>
>
> {noformat}
> 2014-10-06 16:59:22,219 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:190)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12078) Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING

2014-10-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161268#comment-14161268
 ] 

Jimmy Xiang commented on HBASE-12078:
-

Playing with ITBLL with PREFIX_TREE encoding enabled (HBASE-11838). It seems 
there is no bug with this encoding anymore. Good job!

> Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING
> 
>
> Key: HBASE-12078
> URL: https://issues.apache.org/jira/browse/HBASE-12078
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6.1
> Environment: CentOS 6.3
> hadoop 2.5.0(hdfs)
> hadoop 2.2.0(hbase)
> hbase 0.98.6.1
> sun-jdk 1.7.0_67-b01
>Reporter: zhangduo
>Assignee: zhangduo
>Priority: Critical
> Fix For: 2.0.0, 0.98.7, 0.99.1
>
> Attachments: HBASE-12078-0.98.patch, HBASE-12078.patch, 
> HBASE-12078_1.patch, prefix_tree_error.patch
>
>
> our row key is combined with two ints, and we found that sometimes when we 
> using only the first int part to scan, the result returned may missing some 
> rows. But when we dump the whole hfile, the row is still there.
> We have written a testcase to reproduce the bug. It works like this:
> put 1-12345
> put 12345-0x0100
> put 12345-0x0101
> put 12345-0x0200
> put 12345-0x0202
> put 12345-0x0300
> put 12345-0x0303
> put 12345-0x0400
> put 12345-0x0404
> flush memstore
> then scan using 12345,the returned row key will be 
> 12345-0x2000(12345-0x1000 expected)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12184) ServerShutdownHandler throws NPE

2014-10-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12184:

Status: Patch Available  (was: Open)

> ServerShutdownHandler throws NPE
> 
>
> Key: HBASE-12184
> URL: https://issues.apache.org/jira/browse/HBASE-12184
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12184.patch
>
>
> {noformat}
> 2014-10-06 16:59:22,219 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:190)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12184) ServerShutdownHandler throws NPE

2014-10-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12184:

Attachment: hbase-12184.patch

> ServerShutdownHandler throws NPE
> 
>
> Key: HBASE-12184
> URL: https://issues.apache.org/jira/browse/HBASE-12184
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12184.patch
>
>
> {noformat}
> 2014-10-06 16:59:22,219 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:190)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12184) ServerShutdownHandler throws NPE

2014-10-06 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12184:
---

 Summary: ServerShutdownHandler throws NPE
 Key: HBASE-12184
 URL: https://issues.apache.org/jira/browse/HBASE-12184
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 2.0.0, 0.99.1


{noformat}
2014-10-06 16:59:22,219 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-2] 
executor.EventHandler: Caught throwable while processing event M_SERVER_SHUTDOWN
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:190)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-12175) Can't create table

2014-10-05 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-12175.
-
Resolution: Invalid

Could be my env issue.

> Can't create table
> --
>
> Key: HBASE-12175
> URL: https://issues.apache.org/jira/browse/HBASE-12175
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>
> Trying to create a table from hbase shell and couldn't get region assigned:
> {noformat}
> ^Gdefault^R^Dtest
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213)
> at 
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> Caused by: java.lang.IllegalArgumentException: Illegal character <10> at 0. 
> Namespaces can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9]:
> ^Gdefault^R^Dtest
> at 
> org.apache.hadoop.hbase.TableName.isLegalNamespaceName(TableName.java:215)
> at 
> org.apache.hadoop.hbase.TableName.isLegalNamespaceName(TableName.java:204)
> at org.apache.hadoop.hbase.TableName.(TableName.java:302)
> at 
> org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:339)
> at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:460)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-12175) Can't create table

2014-10-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-12175:
---

 Summary: Can't create table
 Key: HBASE-12175
 URL: https://issues.apache.org/jira/browse/HBASE-12175
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang


Trying to create a table from hbase shell and couldn't get region assigned:

{noformat}
^Gdefault^R^Dtest
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2213)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
Caused by: java.lang.IllegalArgumentException: Illegal character <10> at 0. 
Namespaces can only contain 'alphanumeric characters': i.e. [a-zA-Z_0-9]:
^Gdefault^R^Dtest
at 
org.apache.hadoop.hbase.TableName.isLegalNamespaceName(TableName.java:215)
at 
org.apache.hadoop.hbase.TableName.isLegalNamespaceName(TableName.java:204)
at org.apache.hadoop.hbase.TableName.(TableName.java:302)
at 
org.apache.hadoop.hbase.TableName.createTableNameIfNecessary(TableName.java:339)
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:460)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, hbase-12166_v2.patch, 
> log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Attachment: hbase-12166_v2.patch

Attched v2 that fixed the issue Stack found.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, hbase-12166_v2.patch, 
> log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
> 

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158679#comment-14158679
 ] 

Jimmy Xiang commented on HBASE-12166:
-

[~stack], good catch! Unbeliveable!

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
> This would see

[jira] [Comment Edited] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158515#comment-14158515
 ] 

Jimmy Xiang edited comment on HBASE-12166 at 10/3/14 11:08 PM:
---

I think I found out the cause. In 
ZKSplitLogManagerCoordination#removeRecoveringRegions:

{noformat}
  listSize = failedServers.size();
  for (int j = 0; j < listSize; j++) {
{noformat}

The listSize is redefined.


was (Author: jxiang):
I think I found out the cause. In 
ZKSplitLogManagerCoordination#removeRecoveringRegions:

{noformat}
  listSize = failedServers.size();
  for (int j = 0; j < listSize; j++) {
{noformat}

The listSize is redefined. That's not a bug, it is a hidden bomb :)

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> R

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158649#comment-14158649
 ] 

Jimmy Xiang commented on HBASE-12166:
-

TestRegionReplicaReplicationEndpoint is ok locally. I can increase the timeout 
a little at checkin (from 1000 to 6000?).

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode del

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158645#comment-14158645
 ] 

Jimmy Xiang commented on HBASE-12166:
-

[~stack], [~jeffreyz], could you take a look the patch? Thanks.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 complete

[jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158641#comment-14158641
 ] 

Jimmy Xiang commented on HBASE-12166:
-

TestMasterObserver should be fixed by the addendumo of HBASE-12167.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 comp

[jira] [Commented] (HBASE-12167) NPE in AssignmentManager

2014-10-03 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158598#comment-14158598
 ] 

Jimmy Xiang commented on HBASE-12167:
-

Checked in an addendum to fix TestMasterObserver.

> NPE in AssignmentManager
> 
>
> Key: HBASE-12167
> URL: https://issues.apache.org/jira/browse/HBASE-12167
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12167.patch
>
>
> If we can't find a region plan, we should check.
> {noformat}
> 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12167) NPE in AssignmentManager

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12167:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Integrated into branch 1 and master. Thanks.

> NPE in AssignmentManager
> 
>
> Key: HBASE-12167
> URL: https://issues.apache.org/jira/browse/HBASE-12167
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: hbase-12167.patch
>
>
> If we can't find a region plan, we should check.
> {noformat}
> 2014-10-02 18:36:27,719 ERROR [MASTER_SERVER_OPERATIONS-a2424:20020-0] 
> executor.EventHandler: Caught throwable while processing event 
> M_SERVER_SHUTDOWN
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1417)
> at 
> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1409)
> at 
> org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:271)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Component/s: wal

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test, wal
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
> This would seem to indicate that we successfully wrote zk that we are 
> recovering:
> 

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Status: Patch Available  (was: Open)

Attached a simple patch. The test is ok locally now. Let's see what the jenkins 
says. Hope this is the last DLR bug.

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6ea

[jira] [Updated] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork

2014-10-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-12166:

Attachment: hbase-12166.patch

> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---
>
> Key: HBASE-12166
> URL: https://issues.apache.org/jira/browse/HBASE-12166
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: Jimmy Xiang
> Fix For: 2.0.0, 0.99.1
>
> Attachments: 12166.txt, hbase-12166.patch, log.txt
>
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have 
> finished log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG 
> [B.defaultRpcServer.handler=1,queue=0,port=37113] ipc.CallRunner(111): 
> B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service: 
> ClientService methodName: Get
>   size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: 
> hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2. is recovering
> 4943   at 
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] 
> master.SplitLogManager(294): finished splitting (more than or equal to) 
> 197337 bytes in 1 log files in 
> [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
>  in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the 
> recovering flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): /hbase/recovering-regions/1588230740 
> znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. 
> Region: adfdcf958dd958f0e2ce59072ce2209d completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. 
> Region: 41d438848305831b61d708a406d5ecde completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. 
> Region: 6a7cada80de2ae5d774fe8cd33bd4cda completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. 
> Region: 65451bd5b38bd16a31e25b62b3305533 completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. 
> Region: 07afdc3748894cf2b56e0075272a95a0 completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. 
> Region: a4337ad2874ee7e599ca2344fce21583 completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] 
> zookeeper.RecoveringRegionWatcher(66): 
> /hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. 
> Region: 9d91d6eafe260ce33e8d7d23ccd13192 completes recovery.
> This would seem to indicate that we successfully wrote zk that we are 
> recove

  1   2   3   4   5   6   7   8   9   10   >