[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2019-02-01 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20908:
---
Fix Version/s: (was: 1.5.0)

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 1.3.3, 1.4.6, 2.2.0, 1.2.7
>
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-23 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20908:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.2.0
>
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-23 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20908:
---

This seems to already be committed to master and branch-2. Let me check the 
branch-1 patch and commit it, then resolve this



> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.2.0
>
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-23 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20908:
---
Fix Version/s: 2.2.0
   1.4.6
   1.3.3
   1.5.0
   3.0.0

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.2.0
>
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-23 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-20908:
---
Fix Version/s: 1.2.7

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 1.2.7, 1.3.3, 1.4.6, 2.2.0
>
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-23 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Attachment: HBASE-20908_v3-branch-1.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-21 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20908:
---
Attachment: HBASE-20908_v3-branch-1.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-21 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20908:
---
Attachment: 20908_v3-branch-1.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: 20908_v3-branch-1.patch, 20908_v3.patch, 
> HBASE-20908.patch, HBASE-20908_v1.patch, HBASE-20908_v3-branch-1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-20 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Attachment: HBASE-20908_v3-branch-1.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: 20908_v3.patch, HBASE-20908.patch, HBASE-20908_v1.patch, 
> HBASE-20908_v3-branch-1.patch, HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-20 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-20908:
---
Attachment: 20908_v3.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: 20908_v3.patch, HBASE-20908.patch, HBASE-20908_v1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-19 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Attachment: HBASE-20908_v3.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20908.patch, HBASE-20908_v1.patch, 
> HBASE-20908_v3.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-19 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Attachment: HBASE-20908_v1.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20908.patch, HBASE-20908_v1.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-18 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Status: Patch Available  (was: Open)

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 2.0.0, 1.2.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20908.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
> 

[jira] [Updated] (HBASE-20908) Infinite loop on regionserver if region replica are reduced

2018-07-18 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated HBASE-20908:
--
Attachment: HBASE-20908.patch

> Infinite loop on regionserver if region replica are reduced 
> 
>
> Key: HBASE-20908
> URL: https://issues.apache.org/jira/browse/HBASE-20908
> Project: HBase
>  Issue Type: Bug
>  Components: read replicas
>Affects Versions: 1.2.0, 2.0.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
>Priority: Major
> Attachments: HBASE-20908.patch
>
>
> Steps to reproduce
> {code}
> hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
> hbase(main):003:0> put 'myTable','r1','cf:col1','1'
> 0 row(s) in 0.1230 seconds
> hbase(main):004:0> disable 'myTable'
> alter '0 row(s) in 2.3040 seconds
> hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
> Updating all regions with the new schema...
> 1/1 regions updated.
> Done.
> 0 row(s) in 11.9550 seconds
> hbase(main):006:0> enable 'myTable'
> 0 row(s) in 1.2620 seconds
> hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
> 0 row(s) in 0.0060 seconds
> {code}
> This is the replica region request which will not be present now in Meta but 
> was there in cache. Server will say that he is not serving this region.
> {code}
> com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException):
>  org.apache.hadoop.hbase.NotServingRegionException: Region 
> d997d9b47a106216b9b117617ec09015 is not online on 
> 10.22.9.76,16020,1531341039091
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {code}
> Eventually, when we will update our cache after looking into meta , we will 
> get into an infinite loop as this event will not be replicated because the 
> location of the replica will not appear again.
> {code}
> java.net.SocketTimeoutException: callTimeout=120, callDuration=2181316: 
> Can't get the location null
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
>   at 
> org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't 
> get the location
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
>   at 
> org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   ... 5 more
> Caused by: java.io.IOException: HRegionInfo was null in myTable, 
> row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0,
>  
> myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
>   at 
>