[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-15 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r489018057



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   @infraio  could you please review again ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-12 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r487273056



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   > Operation timeout is not the best choice too but better
   
   @infraio  In scan operation, there are 2 operations. One is to wait for lock 
and other is to wait for rpc to complete. On top of that we have retries. The 
problem we are trying to solve here is what is the timeout to use for lock. If 
we wait for operation timeout period and if it can't get the lock after the 
timeout, it will not have any time remaining for next attempts. So I am 
confused when you suggest to use operation timeout, are you suggesting to wait 
for operation timeout period while trying to get lock ?

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you !

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you ! Cc @SukumarMaddineni 

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   > Operation timeout is not the best choice too but better
   
   @infraio  In scan operation, there are 2 operations. One is to wait for lock 
and other is to wait for rpc to complete. On top of that we have retries. The 
problem we are trying to solve here is what is the timeout to use for lock. If 
we wait for operation timeout period and if it can't get the lock after the 
timeout, it will not have any time remaining for next attempts. So I am 
confused when you suggest to use operation timeout, are you suggesting to wait 
for operation timeout period while trying to get lock ?

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you !

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you ! Cc @SukumarMaddineni 

##
File path: 

[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-12 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r487273056



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   > Operation timeout is not the best choice too but better
   
   @infraio  In scan operation, there are 2 operations. One is to wait for lock 
and other is to wait for rpc to complete. On top of that we have retries. The 
problem we are trying to solve here is what is the timeout to use for lock. If 
we wait for operation timeout period and if it can't get the lock after the 
timeout, it will not have any time remaining for next attempts. So I am 
confused when you suggest to use operation timeout, are you suggesting to wait 
for operation timeout period while trying to get lock ?

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you !

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you ! Cc @SukumarMaddineni 

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   > Operation timeout is not the best choice too but better
   
   @infraio  In scan operation, there are 2 operations. One is to wait for lock 
and other is to wait for rpc to complete. On top of that we have retries. The 
problem we are trying to solve here is what is the timeout to use for lock. If 
we wait for operation timeout period and if it can't get the lock after the 
timeout, it will not have any time remaining for next attempts. So I am 
confused when you suggest to use operation timeout, are you suggesting to wait 
for operation timeout period while trying to get lock ?

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you !

##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you ! Cc @SukumarMaddineni 





This is an automated 

[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-11 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r487274061



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you ! Cc @SukumarMaddineni 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-11 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r487274061



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @saintstack  Could you please chime in with your inputs. I think we are 
going back and forth on which timeout to use. Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 to wrap the whole scan 
operation within operation timeout but is outside the scope of this jira. Thank 
you !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-11 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r487273056



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   > Operation timeout is not the best choice too but better
   
   @infraio  In scan operation, there are 2 operations. One is to wait for lock 
and other is to wait for rpc to complete. On top of that we have retries. The 
problem we are trying to solve here is what is the timeout to use for lock. If 
we wait for operation timeout period and if it can't get the lock after the 
timeout, it will not have any time remaining for next attempts. So I am 
confused when you suggest to use operation timeout, are you suggesting to wait 
for operation timeout period while trying to get lock ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-10 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r486558569



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   > What is the resolution here @shahrs87 + @infraio ?
   
   @infraio  pointed out one bug in my patch. Fixed it yesterday. Waiting for 
his feedback. Thank you for being so patient !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-10 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r486492130



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
   @infraio  I moved the acquiring of lock inside try catch block. 
https://github.com/apache/hbase/blob/58cb1cdaba1dbf952647934be0f34b4bad2e71db/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L868
   Also added a test case for that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-09 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r486075626



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   > Please check the code. The LockTimeoutException will be throwed out 
and terminate the loop. It is not in the try...catch code block.
   
   @infraio  you are right. Fixed that in latest commit. Please review again.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-08 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r485216698



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   > If callable.prepare throw the LockTimeoutException, 
   
   IIUC the code, callable.prepare will never throw LockTimeoutException. We 
throw LockTimeoutException 
[here](https://github.com/apache/hbase/blob/f3210c8908809fa154118454e6a23f57bc222db1/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L866)
  At this place we haven't created ReversedClientScanner object also. There is 
one retry loop 
[here](https://github.com/apache/hbase/blob/f3210c8908809fa154118454e6a23f57bc222db1/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L848)
 which will retry if exception is not TNFE or retries are not exhausted.
   @infraio  please correct me if I misunderstood anything.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-05 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r483976537



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   Given that we have 4 +1s on using hbase.client.scanner.timeout.period, 
can we commit this patch ?  Also I have created 
https://issues.apache.org/jira/browse/HBASE-24983 for wrapping scan operation 
under operation timeout. 
   @apurtell  @saintstack  @bharathv  @virajjasani  @infraio  what do you guys 
think ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-04 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r483859942



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   Hi @infraio, 
   I am not able to understand your last comment. Could you please elaborate ? 
Thank you !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-02 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r482175115



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   Let me explain my POV with one of our client example.
We have an internal customer who wants a strict 15 seconds SLA for every 
operation. Since operation timeout is end to end timeout which includes all the 
retries, sleep within retries so we suggested them to use operation timeout as 
15 seconds.
   Also we recommended them to set scanner timeout period to 7 seconds with  
retries config (hbase.client.retries.number)  set to 2 .  There is some sleep 
interval between each attempt and we expect the call to complete within 15 
seconds and if it doesn't then operation timeout will kick in and fail the call.
   But we found out that getRegionLocations call is not bounded by operation 
timeout.
   Now if we set the lock timeout to same as operation timeout, then in worst 
case scenario call will fail in 15 (lock timeout) + 15 (lock timeout 2nd try) + 
1 (assuming sleep of 1 second) = 31 seconds
   If we set the lock timeout to same as scanner timeout, then in worst case 
scenario the call will fail in 7 (scanner timeout) + 7 (scanner timeout) + 1 
(sleep between tries) = 16 seconds which is closer to SLA that we promised.
   Hope this makes sense. If still the community wants to go forward with 
operation timeout, I will change it to operation timeout.
   @infraio  @bharathv  @virajjasani  @saintstack  @apurtell 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-09-02 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r482175115



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   We have an internal customer who wants a strict 15 seconds SLA for every 
operation. Since operation timeout is end to end timeout which includes all the 
retries, sleep within retries so we suggested them to use operation timeout as 
15 seconds.
   Also we recommended them to set scanner timeout period to 7 seconds with  
retries config (hbase.client.retries.number)  set to 2 .  There is some sleep 
interval between each attempt and we expect the call to complete within 15 
seconds and if it doesn't then operation timeout will kick in and fail the call.
   But we found out that getRegionLocations call is not bounded by operation 
timeout.
   Now if we set the lock timeout to same as operation timeout, then in worst 
case scenario call will fail in 15 (lock timeout) + 15 (lock timeout 2nd try) + 
1 (assuming sleep of 1 second) = 31 seconds
   If we set the lock timeout to same as scanner timeout, then in worst case 
scenario the call will fail in 7 (scanner timeout) + 7 (scanner timeout) + 1 
(sleep between tries) = 16 seconds which is closer to SLA that we promised.
   Hope this makes sense. If still the community wants to go forward with 
operation timeout, I will change it to operation timeout.
   @infraio  @bharathv  @virajjasani  @saintstack  @apurtell 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-31 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r480476762



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   @saintstack  @apurtell  @infraio  We need to come to consensus on which 
timeout property to use ? IMHO don't think we should use operation timeout.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-31 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r480240959



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/LockTimeoutException.java
##
@@ -0,0 +1,32 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client;
+
+import org.apache.hadoop.hbase.HBaseIOException;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/*
+  Thrown whenever we are not able to get the lock within the specified wait 
time.
+ */
+@InterfaceAudience.Public

Review comment:
   Maybe when I create a new PR for branch-1, I can re-use the existing 
exception class. Would that work ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-31 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r480239457



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/LockTimeoutException.java
##
@@ -0,0 +1,32 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client;
+
+import org.apache.hadoop.hbase.HBaseIOException;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/*
+  Thrown whenever we are not able to get the lock within the specified wait 
time.
+ */
+@InterfaceAudience.Public

Review comment:
   > I just realized there is an exact same class LockTimeoutException 
under org.apache.hadoop.hbase.exceptions, switch to that?
   
   This class only exists in branch-1 and not in master/branch-2. :(





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-31 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r480197093



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   > I am +1 to use operation timeout here.
   
   @infraio  the default value for operation timeout is 20 minutes. Do you want 
clients to wait for 20 minutes to acquire this lock ? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-29 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r479713606



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/LockTimeoutException.java
##
@@ -0,0 +1,32 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client;
+
+import org.apache.hadoop.hbase.HBaseIOException;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/*
+  Thrown whenever we are not able to get the lock within the specified wait 
time.
+ */
+@InterfaceAudience.Public

Review comment:
   Actually I don't know. Since this will thrown back all the way to client 
so thought to make it public. But open for suggestions.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-29 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r479713547



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionConfiguration.java
##
@@ -117,6 +119,11 @@
 
 this.writeRpcTimeout = conf.getInt(HConstants.HBASE_RPC_WRITE_TIMEOUT_KEY,
 conf.getInt(HConstants.HBASE_RPC_TIMEOUT_KEY, 
HConstants.DEFAULT_HBASE_RPC_TIMEOUT));
+
+this.scannerTimeoutPeriod = HBaseConfiguration.getInt(conf,

Review comment:
   Looks like HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY config 
property is deprecated after 0.96 release. So remove the deprecated config 
property altogether.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-28 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r479540958



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -863,13 +863,15 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
   }
   // Query the meta region
   long pauseBase = this.pause;
-  userRegionLock.lock();
+  takeUserRegionLock();
   try {
-if (useCache) {// re-check cache after get lock
-  RegionLocations locations = getCachedLocation(tableName, row);
-  if (locations != null && locations.getRegionLocation(replicaId) != 
null) {
-return locations;
-  }
+// We don't need to check if useCache is enabled or not. Even if 
useCache is false

Review comment:
   On my local system I created patch for this on top of 1.3 branch but 
while porting the patch to branch-2, I missed applying this hunk. 
   @saintstack  @apurtell  Since you guys already +1 the previous diff, wanted 
to bring this change to your attention. Sorry for confusion.
   Cc @bharathv  @infraio 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-27 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r478767046



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
 }
   }
 
+  private void takeUserRegionLock() throws IOException {
+try {
+  long waitTime = connectionConfig.getScannerTimeoutPeriod();
+  if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
   @bharathv  @infraio  Thank you for your feedback. 
   
   > Specifically I was wondering if hbase.client.operation.timeout would be 
the right one to use for this. 
   
   I don't feel hbase.client.operation.timeout is the right choice here too. 
This config is meant for the whole end to end operation timeout which includes 
all layers of retries and the default value is 20 mins. If we use this timeout 
then we are not gaining anything.
   
   We can introduce a new config property (something like 
hbase.client.lock.timeout.period) and default it to something like 10 seconds. 
That way we don't depend on existing scanner/operation timeout periods. Let me 
know what you guys think. Thank you !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-27 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r478701500



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/LockTimeoutException.java
##
@@ -0,0 +1,32 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client;
+
+import java.io.IOException;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/*
+  Thrown whenever we are not able to get the lock within the specified wait 
time.
+ */
+@InterfaceAudience.Public
+public class LockTimeoutException extends IOException {

Review comment:
   Fixed in latest commit. Please review again





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

2020-08-27 Thread GitBox


shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r478701500



##
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/LockTimeoutException.java
##
@@ -0,0 +1,32 @@
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.client;
+
+import java.io.IOException;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/*
+  Thrown whenever we are not able to get the lock within the specified wait 
time.
+ */
+@InterfaceAudience.Public
+public class LockTimeoutException extends IOException {

Review comment:
   Fixed in latest commit.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org