[jira] [Commented] (HBASE-28055) Performance improvement for scan over several stores.

2023-09-11 Thread Sergey Soldatov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764012#comment-17764012
 ] 

Sergey Soldatov commented on HBASE-28055:
-

[~bbeaudreault], I've added pull requests for branch-2.4, branch-2, and 
branch-2.5. That should fix the build failure. Sorry about that.

> Performance improvement for scan over several stores. 
> --
>
> Key: HBASE-28055
> URL: https://issues.apache.org/jira/browse/HBASE-28055
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-4, 2.5.5
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 4.0.0-alpha-1
>
>
> During the fix of HBASE-19863, an additional check for fake cells that 
> trigger reseek was added.  It comes that this check produces unnecessary 
> reseeks because
> matcher.compareKeyForNextColumn should be used only with indexed keys. Later  
> [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a 
> better solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HBASE-28055) Performance improvement for scan over several stores.

2023-08-30 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28055 started by Sergey Soldatov.
---
> Performance improvement for scan over several stores. 
> --
>
> Key: HBASE-28055
> URL: https://issues.apache.org/jira/browse/HBASE-28055
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-4, 2.5.5
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> During the fix of HBASE-19863, an additional check for fake cells that 
> trigger reseek was added.  It comes that this check produces unnecessary 
> reseeks because
> matcher.compareKeyForNextColumn should be used only with indexed keys. Later  
> [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a 
> better solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28055) Performance improvement for scan over several stores.

2023-08-30 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-28055:
---

 Summary: Performance improvement for scan over several stores. 
 Key: HBASE-28055
 URL: https://issues.apache.org/jira/browse/HBASE-28055
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.5, 3.0.0-alpha-4
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


During the fix of HBASE-19863, an additional check for fake cells that trigger 
reseek was added.  It comes that this check produces unnecessary reseeks because
matcher.compareKeyForNextColumn should be used only with indexed keys. Later  
[~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a 
better solution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only

2023-02-09 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-27630 started by Sergey Soldatov.
---
> hbase-spark bulkload stage directory limited to hdfs only
> -
>
> Key: HBASE-27630
> URL: https://issues.apache.org/jira/browse/HBASE-27630
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 3.0.0-alpha-3
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> It's impossible to set up the staging directory for bulkload operation in 
> spark-hbase connector to any other filesystem different from hdfs. That might 
> be a problem for deployments where hbase.rootdir points to cloud storage. In 
> this case, an additional copy task from hdfs to cloud storage would be 
> required before loading hfiles to hbase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only

2023-02-09 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-27630:
---

 Summary: hbase-spark bulkload stage directory limited to hdfs only
 Key: HBASE-27630
 URL: https://issues.apache.org/jira/browse/HBASE-27630
 Project: HBase
  Issue Type: Bug
  Components: spark
Affects Versions: 3.0.0-alpha-3
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


It's impossible to set up the staging directory for bulkload operation in 
spark-hbase connector to any other filesystem different from hdfs. That might 
be a problem for deployments where hbase.rootdir points to cloud storage. In 
this case, an additional copy task from hdfs to cloud storage would be required 
before loading hfiles to hbase.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.

2023-01-19 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov resolved HBASE-20719.
-
Resolution: Duplicate

> HTable.batch() doesn't handle TableNotFound correctly.
> --
>
> Key: HBASE-20719
> URL: https://issues.apache.org/jira/browse/HBASE-20719
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Minor
>
> batch() as well as delete() are processing using AsyncRequest. To report 
> about problems we are using RetriesExhaustedWithDetailsException and there is 
> no special handling for TableNotFound exception. So, the final result for 
> running batch or delete operations against not existing table looks really 
> weird and missleading:
> {noformat}
> hbase(main):003:0> delete 't1', 'r1', 'c1'
> 2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot 
> get replica 0 location for 
> {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807}
> ERROR: Failed 1 action: t1: 1 time, servers with issues: null
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-20926) IntegrationTestRSGroup is broken

2023-01-19 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov resolved HBASE-20926.
-
Resolution: Invalid

The class was refactored to a number of smaller tests

> IntegrationTestRSGroup is broken 
> -
>
> Key: HBASE-20926
> URL: https://issues.apache.org/jira/browse/HBASE-20926
> Project: HBase
>  Issue Type: Bug
>  Components: integration tests
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0-alpha-4
>
>
> There are several problems:
> 1. It doesn't work in minicluster mode because in afterMethod()  it using 
> IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in 
> the not distributed mode
> 2. It uses tests from TestRSGroups which was supposed to be common for both 
> unit and integration tests, but for last two years, there were a number of 
> tests added that are using internal API, not compatible with the distributed 
> mode. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27397) Spark-hbase support for 'startWith' predicate.

2022-09-27 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-27397:
---

 Summary: Spark-hbase support for 'startWith' predicate.
 Key: HBASE-27397
 URL: https://issues.apache.org/jira/browse/HBASE-27397
 Project: HBase
  Issue Type: New Feature
  Components: hbase-connectors
Affects Versions: 3.0.0-alpha-3
Reporter: Sergey Soldatov


Currently, spark-hbase connector doesn't support stringStartWith predicate and 
completely ignores it. This is a disadvantage compared to the Apache Phoenix 
connector and the old SHC (Hortonworks spark-hbase connector). It would be nice 
if we also have this functionality in the actual connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27061) two phase bulkload is broken when SFT is in use.

2022-05-24 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-27061:
---

 Summary: two phase bulkload is broken when SFT is in use.
 Key: HBASE-27061
 URL: https://issues.apache.org/jira/browse/HBASE-27061
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.12
Reporter: Sergey Soldatov


In HBASE-26707 for the SFT case, we are writing files directly to the region 
location. For that we are using HRegion.regionDir as the staging directory. The 
problem is that in reality, this dir is pointing to the WAL dir, so for S3 
deployments that would be pointing to the hdfs. As the result during the 
execution of LoadIncrementalHFiles the process failed with the exception:
{noformat}
2022-05-24 03:31:23,656 ERROR 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete 
bulk load
java.lang.IllegalArgumentException: Wrong FS 
hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da
 -expected s3a://hbase
at 
org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397)
at 
org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994)
at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291)
at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:360)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879)
at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)

 {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HBASE-27061) two phase bulkload is broken when SFT is in use.

2022-05-24 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-27061:
---

Assignee: Sergey Soldatov

> two phase bulkload is broken when SFT is in use.
> 
>
> Key: HBASE-27061
> URL: https://issues.apache.org/jira/browse/HBASE-27061
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.12
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> In HBASE-26707 for the SFT case, we are writing files directly to the region 
> location. For that we are using HRegion.regionDir as the staging directory. 
> The problem is that in reality, this dir is pointing to the WAL dir, so for 
> S3 deployments that would be pointing to the hdfs. As the result during the 
> execution of LoadIncrementalHFiles the process failed with the exception:
> {noformat}
> 2022-05-24 03:31:23,656 ERROR 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to 
> complete bulk load
> java.lang.IllegalArgumentException: Wrong FS 
> hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da
>  -expected s3a://hbase
> at 
> org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375)
> at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521)
> at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (HBASE-27061) two phase bulkload is broken when SFT is in use.

2022-05-24 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-27061 started by Sergey Soldatov.
---
> two phase bulkload is broken when SFT is in use.
> 
>
> Key: HBASE-27061
> URL: https://issues.apache.org/jira/browse/HBASE-27061
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.12
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> In HBASE-26707 for the SFT case, we are writing files directly to the region 
> location. For that we are using HRegion.regionDir as the staging directory. 
> The problem is that in reality, this dir is pointing to the WAL dir, so for 
> S3 deployments that would be pointing to the hdfs. As the result during the 
> execution of LoadIncrementalHFiles the process failed with the exception:
> {noformat}
> 2022-05-24 03:31:23,656 ERROR 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to 
> complete bulk load
> java.lang.IllegalArgumentException: Wrong FS 
> hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da
>  -expected s3a://hbase
> at 
> org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375)
> at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521)
> at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:360)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879)
> at 
> org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453)
> at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339)
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (HBASE-27053) IOException during caching of uncompressed block to the block cache.

2022-05-19 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-27053 started by Sergey Soldatov.
---
> IOException during caching of uncompressed block to the block cache.
> 
>
> Key: HBASE-27053
> URL: https://issues.apache.org/jira/browse/HBASE-27053
> Project: HBase
>  Issue Type: Bug
>  Components: BlockCache
>Affects Versions: 2.4.12
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> When prefetch to block cache is enabled and blocks are compressed sometimes 
> caching fails with the exception:
> {noformat}
> 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] 
> regionserver.HRegion: Could not initialize all stores for the 
> region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7.
> 2022-05-18 21:37:29,598 WARN  [RS_OPEN_REGION-regionserver/x1:16020-2] 
> regionserver.HRegion: Failed initialize of region= 
> cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., 
> starting to roll back memstore
> java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached 
> block contents differ, which should not have 
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147)
>   at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.lang.RuntimeException: Cached block 
> contents differ, which should not have 
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
>   at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344)
>   at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:294)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375)
>   at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115)
>   at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   ... 3 more
> Caused by: java.lang.RuntimeException: Cached block contents differ, which 
> should not have 
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
>   at 
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199)
>   at 
> org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432)
>   at 
> org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418)
>   at 
> org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319)
>   at java.util.Optional.ifPresent(Optional.java:159)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942)
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931)
>   at 
> 

[jira] [Commented] (HBASE-27053) IOException during caching of uncompressed block to the block cache.

2022-05-19 Thread Sergey Soldatov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539859#comment-17539859
 ] 

Sergey Soldatov commented on HBASE-27053:
-

So, why and when did it happen. The usual scenario is when a region split has 
just happened and RS is trying to open both daughters and load those to the 
bucket cache. There is a single store that has the split point, so two threads 
are reading the same block and trying to store it in the cache. When we 
decompress the block, the first thing we are doing is the space allocation:

[https://github.com/apache/hbase/blob/c7eb30d91015de67fb8207ac1818ce2a29dd60a4/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L649]

Here we allocate the space for uncompressed data *AND* checksum. But we fill 
the data only, without filling the checksum space which actually might be 
filled by garbage from the previous usage. So the block is cached with this 
garbage and obviously, that part might and would not match when we try to store 
the same block from another thread. 

Unfortunately, I was unable to create a reasonable unit test for this scenario, 
but there is the manual steps to reproduce:

a single node cluster with following configuration tweaks:

hbase.hregion.memstore.flush.size=100 

hbase.hregion.max.filesize=1000 

hbase.bucketcache.ioengine=file:/tmp/hbase_cache

hbase.bucketcache.size=20

hbase.hfile.thread.prefetch=4

 

And the load is generated by:
{noformat}

hbase org.apache.hadoop.hbase.util.LoadTestTool -compression SNAPPY -write 
1:10:100 -num_keys 1 {noformat}
Usually, after 700k-1.2m records, the earlier mentioned exception appears in 
the RS log.

So, to solve the problem I would suggest adding a code that cleans up the 
checksum space when the decompression is completed. It doesn't look like an 
optimal solution, but right after decompression, we don't know whether the 
checksum space will be used or not, so we could not just trim the bytebuff. 

> IOException during caching of uncompressed block to the block cache.
> 
>
> Key: HBASE-27053
> URL: https://issues.apache.org/jira/browse/HBASE-27053
> Project: HBase
>  Issue Type: Bug
>  Components: BlockCache
>Affects Versions: 2.4.12
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> When prefetch to block cache is enabled and blocks are compressed sometimes 
> caching fails with the exception:
> {noformat}
> 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] 
> regionserver.HRegion: Could not initialize all stores for the 
> region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7.
> 2022-05-18 21:37:29,598 WARN  [RS_OPEN_REGION-regionserver/x1:16020-2] 
> regionserver.HRegion: Failed initialize of region= 
> cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., 
> starting to roll back memstore
> java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached 
> block contents differ, which should not have 
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996)
>   at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090)
>   at 
> org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147)
>   at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.IOException: java.lang.RuntimeException: Cached block 
> contents differ, which should not have 
> happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
>   at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344)
>   at 

[jira] [Created] (HBASE-27053) IOException during caching of uncompressed block to the block cache.

2022-05-18 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-27053:
---

 Summary: IOException during caching of uncompressed block to the 
block cache.
 Key: HBASE-27053
 URL: https://issues.apache.org/jira/browse/HBASE-27053
 Project: HBase
  Issue Type: Bug
  Components: BlockCache
Affects Versions: 2.4.12
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


When prefetch to block cache is enabled and blocks are compressed sometimes 
caching fails with the exception:
{noformat}
2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] 
regionserver.HRegion: Could not initialize all stores for the 
region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7.
2022-05-18 21:37:29,598 WARN  [RS_OPEN_REGION-regionserver/x1:16020-2] 
regionserver.HRegion: Failed initialize of region= 
cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., starting 
to roll back memstore
java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached 
block contents differ, which should not have 
happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996)
  at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134)
  at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090)
  at 
org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147)
  at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: java.lang.RuntimeException: Cached block 
contents differ, which should not have 
happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
  at 
org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294)
  at 
org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344)
  at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:294)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375)
  at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115)
  at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  ... 3 more
Caused by: java.lang.RuntimeException: Cached block contents differ, which 
should not have 
happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0
  at 
org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199)
  at 
org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432)
  at 
org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418)
  at 
org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319)
  at java.util.Optional.ifPresent(Optional.java:159)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931)
  at 
org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:171)
  at 
org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:321)
  at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:477)
  at 
org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:490)
  at 

[jira] [Commented] (HBASE-26972) Restored table from snapshot that has MOB is inconsistent

2022-04-22 Thread Sergey Soldatov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526717#comment-17526717
 ] 

Sergey Soldatov commented on HBASE-26972:
-

The reason of that: 
/hbase2.4/mobdir/data/default/table_1/1bccf339572b9a4db7475abcf57eeb8f/data/table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
which is a link to 
/hbase2.4/archive/data/default/table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
But HFileLink  is unable to detect that this is a link because of the 
underscore in the middle of the file name and that doesn't fit to the pattern:
{code}
public static final String HFILE_NAME_REGEX = 
"[0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?";
{code}
So isHFileLink would return false for such files and that leads to 
unpredictable results. 
The easy way to fix it is to allow underscore in the HFILE_NAME_REGEX, but I'm 
feeling a bit uncomfortable about it.  I would really appreciate suggestions. 

> Restored table from snapshot that has MOB is inconsistent
> -
>
> Key: HBASE-26972
> URL: https://issues.apache.org/jira/browse/HBASE-26972
> Project: HBase
>  Issue Type: Bug
>  Components: mob, snapshots
>Affects Versions: 3.0.0-alpha-2
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> When we restore the table from snapshot and it has MOB files, there are links 
> that do not fit the pattern of HFileLink. I'm not sure to which side effects 
> that might lead, but at least it's not possible to create a snapshot right 
> after the restore:
> {quote}
> Version 3.0.0-alpha-3-SNAPSHOT, rcd45cadbc1a42db359ff4e775cbd4b55cfe28140, 
> Fri Apr 22 03:04:25 PM PDT 2022
> Took 0.0016 seconds   
>   
>   
> hbase:001:0> list_snapshot
> list_snapshot_sizes   list_snapshots
> hbase:001:0> list_snapshots
> SNAPSHOT TABLE + CREATION TIME
>   
>   
>  t1  table_1 (2022-04-22 15:48:04 -0700)  
>   
>   
> 1 row(s)
> Took 1.0881 seconds   
>   
>   
> => ["t1"]
> hbase:002:0> restore_snapshot 't1'
> Took 2.3942 seconds   
>   
>   
> hbase:003:0> snapshot
> snapshot   snapshot_cleanup_enabled   snapshot_cleanup_switch 
>
> hbase:003:0> snapshot 'table_1', 't2'
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
> ss=t2 table=table_1 type=FLUSH ttl=0 } had an error.  Procedure t2 { 
> waiting=[] done=[] }
>   at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:403)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1325)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>   at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:106)
>   at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:86)
> Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
> Failed taking snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } due to 
> exception:Can't find hfile: 
> table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
>  in the real 
> (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
>  or archive 
> (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
>  directory for the primary 
> table.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't 
> find hfile: 
> 

[jira] [Created] (HBASE-26972) Restored table from snapshot that has MOB is inconsistent

2022-04-22 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-26972:
---

 Summary: Restored table from snapshot that has MOB is inconsistent
 Key: HBASE-26972
 URL: https://issues.apache.org/jira/browse/HBASE-26972
 Project: HBase
  Issue Type: Bug
  Components: mob, snapshots
Affects Versions: 3.0.0-alpha-2
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


When we restore the table from snapshot and it has MOB files, there are links 
that do not fit the pattern of HFileLink. I'm not sure to which side effects 
that might lead, but at least it's not possible to create a snapshot right 
after the restore:
{quote}
Version 3.0.0-alpha-3-SNAPSHOT, rcd45cadbc1a42db359ff4e775cbd4b55cfe28140, Fri 
Apr 22 03:04:25 PM PDT 2022
Took 0.0016 seconds 
  
hbase:001:0> list_snapshot
list_snapshot_sizes   list_snapshots
hbase:001:0> list_snapshots
SNAPSHOT TABLE + CREATION TIME  
  
 t1  table_1 (2022-04-22 15:48:04 -0700)
  
1 row(s)
Took 1.0881 seconds 
  
=> ["t1"]
hbase:002:0> restore_snapshot 't1'
Took 2.3942 seconds 
  
hbase:003:0> snapshot
snapshot   snapshot_cleanup_enabled   snapshot_cleanup_switch   
 
hbase:003:0> snapshot 'table_1', 't2'

ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
ss=t2 table=table_1 type=FLUSH ttl=0 } had an error.  Procedure t2 { waiting=[] 
done=[] }
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:403)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1325)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:106)
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:86)
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via 
Failed taking snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } due to 
exception:Can't find hfile: 
table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
 in the real 
(hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
 or archive 
(hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
 directory for the primary 
table.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find 
hfile: 
table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
 in the real 
(hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
 or archive 
(hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a)
 directory for the primary table.
at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82)
at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:322)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:392)
... 6 more
Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't 
find hfile: 
table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
 

[jira] [Commented] (HBASE-26767) Rest server should not use a large Header Cache.

2022-02-23 Thread Sergey Soldatov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496975#comment-17496975
 ] 

Sergey Soldatov commented on HBASE-26767:
-

[~elserj] You are absolutely right.  I was observing this issue with the 
Kerberos authn. The same connection was used for different  logins and after a 
while when the cache got overflowed the service started to return authn 
failures (401/403/400)

> Rest server should not use a large Header Cache.
> 
>
> Key: HBASE-26767
> URL: https://issues.apache.org/jira/browse/HBASE-26767
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.4.9
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE 
> (65536). That's not compatible with jetty-9.4.x because the cache size is 
> limited by Character.MAX_VALUE - 1  (65534) there. According to the Jetty 
> source code comments, it's possible to have a buffer overflow in the cache 
> for higher values and that might lead to wrong/incomplete values returned by 
> cache and following incorrect header handling.  
> There are a couple of ways to fix it:
> 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534
> 2. make header cache size configurable and set its size separately from the 
> header size. 
> I believe that the second would give us more flexibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HBASE-26767) Rest server should not use a large Header Cache.

2022-02-23 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26767 started by Sergey Soldatov.
---
> Rest server should not use a large Header Cache.
> 
>
> Key: HBASE-26767
> URL: https://issues.apache.org/jira/browse/HBASE-26767
> Project: HBase
>  Issue Type: Bug
>  Components: REST
>Affects Versions: 2.4.9
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE 
> (65536). That's not compatible with jetty-9.4.x because the cache size is 
> limited by Character.MAX_VALUE - 1  (65534) there. According to the Jetty 
> source code comments, it's possible to have a buffer overflow in the cache 
> for higher values and that might lead to wrong/incomplete values returned by 
> cache and following incorrect header handling.  
> There are a couple of ways to fix it:
> 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534
> 2. make header cache size configurable and set its size separately from the 
> header size. 
> I believe that the second would give us more flexibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26767) Rest server should not use a large Header Cache.

2022-02-22 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-26767:
---

 Summary: Rest server should not use a large Header Cache.
 Key: HBASE-26767
 URL: https://issues.apache.org/jira/browse/HBASE-26767
 Project: HBase
  Issue Type: Bug
  Components: REST
Affects Versions: 2.4.9
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE 
(65536). That's not compatible with jetty-9.4.x because the cache size is 
limited by Character.MAX_VALUE - 1  (65534) there. According to the Jetty 
source code comments, it's possible to have a buffer overflow in the cache for 
higher values and that might lead to wrong/incomplete values returned by cache 
and following incorrect header handling.  
There are a couple of ways to fix it:
1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534
2. make header cache size configurable and set its size separately from the 
header size. 

I believe that the second would give us more flexibility.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work started] (HBASE-26463) Unreadable table names after HBASE-24605

2021-11-17 Thread Sergey Soldatov (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26463 started by Sergey Soldatov.
---
> Unreadable table names after HBASE-24605
> 
>
> Key: HBASE-26463
> URL: https://issues.apache.org/jira/browse/HBASE-26463
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Trivial
> Attachments: After.png, Before.png
>
>
> During the fix HBASE-24605 (Break long region names in the web UI) the 'word 
> break' rule was applied to the 'table-striped' style, so literally, all 
> tables in Master Web UI were affected. The most noticeable is the 'User 
> Tables' when table names, as well as descriptions, and even State are wrapped 
> in a very ugly way. 
> We have two options here: 
> 1. Fix the user tables only as it was done previously for the procedures table
> 2. Apply 'word break' to those tables that require that. 
> Since most of the users are comfortable with the current UI we may go through 
> the first way, so the changes would affect only the User table. 
> Sample screenshots before and after are attached. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-26463) Unreadable table names after HBASE-24605

2021-11-17 Thread Sergey Soldatov (Jira)
Sergey Soldatov created HBASE-26463:
---

 Summary: Unreadable table names after HBASE-24605
 Key: HBASE-26463
 URL: https://issues.apache.org/jira/browse/HBASE-26463
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 3.0.0-alpha-1
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov
 Attachments: After.png, Before.png

During the fix HBASE-24605 (Break long region names in the web UI) the 'word 
break' rule was applied to the 'table-striped' style, so literally, all tables 
in Master Web UI were affected. The most noticeable is the 'User Tables' when 
table names, as well as descriptions, and even State are wrapped in a very ugly 
way. 
We have two options here: 
1. Fix the user tables only as it was done previously for the procedures table
2. Apply 'word break' to those tables that require that. 
Since most of the users are comfortable with the current UI we may go through 
the first way, so the changes would affect only the User table. 
Sample screenshots before and after are attached. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-26267) Master initialization fails if Master Region WAL dir is missing

2021-10-11 Thread Sergey Soldatov (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427210#comment-17427210
 ] 

Sergey Soldatov commented on HBASE-26267:
-

[~zhangduo] [~zyork] do you have any other concerns regarding Josh's PR? Could 
we get it in?

> Master initialization fails if Master Region WAL dir is missing
> ---
>
> Key: HBASE-26267
> URL: https://issues.apache.org/jira/browse/HBASE-26267
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 2.4.6
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Major
> Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.8
>
>
> From a recent branch-2.4 build:
> {noformat}
> 2021-09-07 19:31:19,666 ERROR [master/localhost:16000:becomeActiveMaster] 
> master.HMaster(159): * ABORTING master localhost,16000,1631057476442: 
> Unhandled exception. Starting shutdown. *
> java.io.FileNotFoundException: File 
> hdfs://localhost:8020/hbase-2.4-wals/MasterData/WALs does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:226)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:303)
> at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
> at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:839)
> at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2189)
> at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:512)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> If the WAL directory is missing but the Master Region already exists, we will 
> try to list the contents of the Master Region's WAL directory which may or 
> may not exist. If we simply check to make sure the directory exists and then 
> the rest of the initialization code works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-17 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617877#comment-16617877
 ] 

Sergey Soldatov commented on HBASE-20952:
-

bq. However, when we think about hypothetical log systems, we could see an 
optimization where the filtering of _just one Region's_ edits from a log can be 
pushed down into the log system itself. Such a feature would remove the need 
for WAL splitting to happen universally in HBase.

That's exactly I wanted to say. There are some cases where we don't need to 
split WALs (WAL per RS model, streams with filters, etc). And the question is 
whether we need to make any significant refactoring in this area (i.e., make 
WALSplitter working with abstract WALs and creating new WALs). Or we may live 
with the case where WALProvider has an interface to get recovered edits for the 
particular region and for the current providers the result of the WALSplitter 
will be returned, but for custom providers that would be implementation 
specific. 

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-14 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615326#comment-16615326
 ] 

Sergey Soldatov commented on HBASE-20952:
-

A few of my personal thoughts that are based on previous experience of 
implementing such kind of functionality in HBase (C5, non-stop hbase, 
hydrabase). It would be better to perform atomic meaningful changes that 
address a particular problem without breaking the existing functionality and 
without adding an additional complexity. We need a 'pluggable'  WALProvider 
that will be agnostic to file system structure. Fine. Let's make it 
configurable and remove the dependency on file systems (introducing WALIdentity 
or whatever we call it). It doesn't work with the current implementation of 
replication? Not good, but we may mention that it in the Limitation section. 
All we need is an understanding that it's possible to implement even if it 
would require a lot of efforts. Possible the replication itself requires a new 
Jira 'Revisit Replication approach' especially if we are talking about 
different kind of wal providers. Possible someone would love to have a setup 
with replication between on-prem cluster that is using regular WAL and another 
in AWS using Bookeeper wal. And that would require another layer of abstraction 
for replication itself. Should it be done under this jira? Honestly speaking I 
don't think so. 

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-14 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615265#comment-16615265
 ] 

Sergey Soldatov commented on HBASE-20952:
-

{quote}In the old time we will roll the wal writer, and it is done by RS, so 
closing the wal file is not enough, as the RS will try to open a new one and 
write to it. That's why we need to rename the wal directory.
{quote}
For the WAL provider that doesn't depend on the HDFS directory structure, there 
should be a manager that keeps information about existing logs. Internals are 
implementation specific (i.e. for Kafka wal provider it may be a separate topic 
or some internal DB. For consensus-based logs like Ratis LogService that might 
be a separate state machine), But any new log should be registered there. 
Adding a new method to WALProvider like 'disable'/'decommission' that would 
tell the manager to reject new logs for particular RS (or even region if we 
consider wal per region schema) is not a problem. For the existing wal 
providers, that method may rename the wal directory.
{quote}In your words above, it seems to me that we will only have one stream 
opened forever for a RS, then how do we drop the old edits after flush? And how 
do we setup the wal stream? Only once at the RS start up? And if there are 
errors later, we just abort?
{quote}
Not necessary. There is no problem to have wal per region. Actually, in some 
cases, it would be preferable. For example Kafka topic per region. Any kind of 
recovery would be a simple subscribe/replay the particular topic. No log 
splits, less offline time. For a regular case, we are not talking about 
streams. It's just a WAL implementation that supports the append operation. For 
replication/recovery we should be able to get a stream and read from the 
particular ID/offset. Error handling should be hidden by the implementation. A 
simple example for quorum based implementation. We have 3 nodes quorum for log 
'RS1.1' (RS1, RS2,RS3). RS2, RS3 went down due some reason, so we lost the 
majority and this quorum becomes read-only. A new log 'RS1.2' is created with 
the quorum (RS1, RS4, RS5) and all writes are going there. But if we speak 
about reading stream it would provide a single instance that iterates through 
RS1.1 and RS1.2 continuously. The same approach may be applied to the existing 
wal files as well.
{quote}And for the FileSystem, we will use multi wal to increase the 
performance, and the logic is messed up with WALProvider. Does ratis still need 
multi wal to increase the performance? And if not, what's the plan? We need to 
refactor the multi wal related code, to not work against the WALProvider but 
something with the FileSystem related stuffs directly?
{quote}
That might be done in the further refactoring of multiwal. At the moment the 
approach is that we may specify 3rd party wal provider class in WALFactory. So 
if it's there, multiwal would not be used at all as it's the provider class. In 
other hands, it could be refactored to something like 'wal strategy' and works 
with any kind of providers.
{quote}had mentioned offline yesterday that he thinks some gaps still exist 
around WAL splitting – do you understand that well enough to suggest what needs 
to be addressed in the doc which is not already there?
{quote}
WALSplitter is a separate topic for the discussion. The current implementation 
has a bunch of dependencies on file operations such as temporary files, list of 
corrupted files, etc. From HBase perspective, it would be much easier to keep 
it as is and make log splitter an interface that should take log and create 
list of recovery logs. But from the perspective of 3rd party wal developer that 
would be a nightmare to handle all possible cases and fit into the split log 
chore logic. In other hands for the 1st iteration, this may be hidden by the 
schema where 3rd party wal may not use the splitter at all and recovery would 
be reading a stream of records provided by the WALProvider for a particular 
region

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup 

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-03 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602483#comment-16602483
 ] 

Sergey Soldatov commented on HBASE-20952:
-

bq. Will the old FB's hydrabase impl help here in determining the APIs needed 
here?

If we are talking about HBASE-12259, than nope. Actually, most of the work for 
Hydrabase was made for the consensus protocol implementation and only a few 
attempts to apply that to the WAL system itself ( that were successfully 
dropped due to not accept for hbase-consensus module). We don't want to add our 
own implementation for quorum based consensus protocol. We want to make current 
WAL system flexible enough to build a new  WAL implementation based whether on 
some 3rd party consensus protocol implementation (RAFT/Paxos/etc) or any 
existing Distributed Log implementations (Apache Kafka, Apache BookKeeper, 
etc). The interfaces should be simple with a meaningful public contract and the 
number of interfaces to implement should be reasonable as well. 

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558989#comment-16558989
 ] 

Sergey Soldatov commented on HBASE-20657:
-

Oops. accidentally switched to branch-2 from master. Actually changes for 
MasterProcedureScheduler were added to master as part of HBASE-20569. Not sure 
what should we do for branch-2 in that case. WDYT [~elserj]

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558973#comment-16558973
 ] 

Sergey Soldatov commented on HBASE-20657:
-

Removed the internal lock implementation, thanks [~Apache9] for making those a 
part of Procedure class 
I still think that changes in MasterProcedureScheduler.java should be applied. 
To reproduce the problem - make MTP holding the lock (as the patch does) and 
run the provided test. Everything will end up with a number of regions stuck in 
RiT state forever. The problem is in the state machine optimization, but not in 
MTP itself, so it may happen with any other complex procedure that may hold the 
lock. 

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-20657:
---

Assignee: stack  (was: Sergey Soldatov)

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-20657:
---

Assignee: Sergey Soldatov  (was: stack)

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Fix Version/s: 3.0.0

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-4-master.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.0.2
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558600#comment-16558600
 ] 

Sergey Soldatov commented on HBASE-20927:
-

[~elserj] Almost. After some modifications (removing dependencies on internal 
API from TestRSGroupBase) noticed this. 

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20927:

Status: Patch Available  (was: Open)

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20927:

Attachment: HBASE-20927.master.002.patch

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558051#comment-16558051
 ] 

Sergey Soldatov commented on HBASE-20927:
-

[~yuzhih...@gmail.com] added a test case. 

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20927:

Attachment: HBASE-20927-master.patch

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
> Attachments: HBASE-20927-master.patch
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-26 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20927:

Fix Version/s: 3.0.0

> RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not 
> processed yet.
> 
>
> Key: HBASE-20927
> URL: https://issues.apache.org/jira/browse/HBASE-20927
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 3.0.0, 2.1.1
>
>
> Admin.clearDeadServers is supposed to return the list of servers that were 
> not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is 
> thrown:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
>  org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers 
> to remove cannot be null or empty.
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
>   at 
> org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
>   at 
> org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
>   at 
> org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> That happens because in postClearDeadServers it calls 
> groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
> empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.

2018-07-23 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20927:
---

 Summary: RSGroupAdminEndpoint doesn't handle clearing dead servers 
if they are not processed yet.
 Key: HBASE-20927
 URL: https://issues.apache.org/jira/browse/HBASE-20927
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov
 Fix For: 2.1.1


Admin.clearDeadServers is supposed to return the list of servers that were not 
cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is thrown:
{noformat}
Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException):
 org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to 
remove cannot be null or empty.
at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573)
at 
org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540)
at 
org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614)
at 
org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604)
at 
org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)

{noformat}

That happens because in postClearDeadServers it calls 
groupAdminServer.removeServers(clearedServer) even if the clearedServer is 
empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20926) IntegrationTestRSGroup is broken

2018-07-23 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20926:
---

 Summary: IntegrationTestRSGroup is broken 
 Key: HBASE-20926
 URL: https://issues.apache.org/jira/browse/HBASE-20926
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 2.1.0
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov
 Fix For: 2.1.1


There are several problems:
1. It doesn't work in minicluster mode because in afterMethod()  it using 
IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in 
the not distributed mode
2. It uses tests from TestRSGroups which was supposed to be common for both 
unit and integration tests, but for last two years, there were a number of 
tests added that are using internal API, not compatible with the distributed 
mode. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20784) Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager

2018-07-06 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535263#comment-16535263
 ] 

Sergey Soldatov commented on HBASE-20784:
-

[~Apache9] it works perfectly for me. Thank you!

> Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager
> 
>
> Key: HBASE-20784
> URL: https://issues.apache.org/jira/browse/HBASE-20784
> Project: HBase
>  Issue Type: Bug
>  Components: master, UI
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-20784.patch
>
>
> In HBASE-20722 we removed the usage of RegionServerTracker when getting 
> information for region server. And version in server manager is a int, and we 
> convert it to a String when displaying it on the master ui, so we will lose 
> the SNAPSHOT suffix. Not a big one as this is not a problem for normal 
> releases. Open a issue for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20784) Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager

2018-07-05 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534361#comment-16534361
 ] 

Sergey Soldatov commented on HBASE-20784:
-

The problem is wider than just losing SNAPSHOT prefix.
1.  It may completely lose the RS version (Shows 0.0.0 in the master UI for the 
version 2.1-SNAPSHOT)
2. That affects any custom build.  i.e. builds from CI with the build number as 
a suffix. So the build number will be stripped and may be a real issue for 
devops during rolling upgrade on large clusters
3. Any vendor-specific builds. 
4. It triggers the report about the inconsistent version between Master and 
RSes.

We may consider using int versions on the master side as well,  but it would 
solve (4) only. 


> Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager
> 
>
> Key: HBASE-20784
> URL: https://issues.apache.org/jira/browse/HBASE-20784
> Project: HBase
>  Issue Type: Bug
>  Components: master, UI
>Reporter: Duo Zhang
>Priority: Minor
>
> In HBASE-20722 we removed the usage of RegionServerTracker when getting 
> information for region server. And version in server manager is a int, and we 
> convert it to a String when displaying it on the master ui, so we will lose 
> the SNAPSHOT suffix. Not a big one as this is not a problem for normal 
> releases. Open a issue for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20852) inconsistent version report in Master UI after HBASE-20722

2018-07-05 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20852:
---

 Summary: inconsistent version report in Master UI after HBASE-20722
 Key: HBASE-20852
 URL: https://issues.apache.org/jira/browse/HBASE-20852
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0, 2.1.0
Reporter: Sergey Soldatov


Master web UI is able to report if the versions of RS and master are different. 
Previously the check was performed between the master version that we get from 
Version class ( generated during the build) and RS version from 
RegionServerTracker (which was using the same Version class). In HBASE-20722 
this behavior has been changed, so now for RS we get numeric version and 
convert it to the string representation. That works only for numeric versions 
like 2.0.0 / 2.1.0, but it doesn't work for -SNAPSHOT or any other custom 
versions and Master UI reports that all region servers have an inconsistent 
version.

[~Apache9], [~stack], [~tedyu] FYI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20826) Truncate responseInfo attributes on RpcServer WARN messages

2018-06-29 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528467#comment-16528467
 ] 

Sergey Soldatov commented on HBASE-20826:
-

[~yuzhih...@gmail.com] Actually my point is that this problem should be handled 
on both sides. So I'm happy with the suggested patch as well as I'm going to do 
some work on Phoenix side as well.  I just want to be sure that I would have a 
full log with TRACE level (sometimes we need to find  duplicated values in the 
filter and it would not be possible without it). 

> Truncate responseInfo attributes on RpcServer WARN messages
> ---
>
> Key: HBASE-20826
> URL: https://issues.apache.org/jira/browse/HBASE-20826
> Project: HBase
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Sergey Soldatov
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0
>
> Attachments: HBASE-20826.001.branch-2.0.patch, 
> HBASE-20826.002.branch-2.0.patch
>
>
> With Phoenix in the picture, dumping the {{Call}} protobuf to the RS log can 
> get *really* chatty, real fast. Notably, some serialized filters just spam 
> the log with binary garbage.
> Let's add an upper-limit to the length of params we'll put out at WARN, and 
> leave the full content for TRACE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20826) Truncate responseInfo attributes on RpcServer WARN messages

2018-06-29 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528303#comment-16528303
 ] 

Sergey Soldatov commented on HBASE-20826:
-

At first place, that should be handled on Filter level as well (so Phoenix 
filters should be aware that they are putting in toString, otherwise it would 
spend too much time just building 10-20Mb strings as well as it would increase 
the memory pressure). And for the TRACE level we shouldn't truncate anything, 
otherwise, it may cause problems with identifying the problem root causes.  
Just my 2 cents.

> Truncate responseInfo attributes on RpcServer WARN messages
> ---
>
> Key: HBASE-20826
> URL: https://issues.apache.org/jira/browse/HBASE-20826
> Project: HBase
>  Issue Type: Improvement
>  Components: rpc
>Reporter: Sergey Soldatov
>Assignee: Josh Elser
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0
>
> Attachments: HBASE-20826.001.branch-2.0.patch, 
> HBASE-20826.002.branch-2.0.patch
>
>
> With Phoenix in the picture, dumping the {{Call}} protobuf to the RS log can 
> get *really* chatty, real fast. Notably, some serialized filters just spam 
> the log with binary garbage.
> Let's add an upper-limit to the length of params we'll put out at WARN, and 
> leave the full content for TRACE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.

2018-06-12 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20719:
---

 Summary: HTable.batch() doesn't handle TableNotFound correctly.
 Key: HBASE-20719
 URL: https://issues.apache.org/jira/browse/HBASE-20719
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.1.0
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov
 Fix For: 2.1.0


batch() as well as delete() are processing using AsyncRequest. To report about 
problems we are using RetriesExhaustedWithDetailsException and there is no 
special handling for TableNotFound exception. So, the final result for running 
batch or delete operations against not existing table looks really weird and 
missleading:
{noformat}
hbase(main):003:0> delete 't1', 'r1', 'c1'
2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot get 
replica 0 location for 
{"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807}

ERROR: Failed 1 action: t1: 1 time, servers with issues: null
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-05 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502708#comment-16502708
 ] 

Sergey Soldatov commented on HBASE-20657:
-

[~stack] Actually the patch I provided makes MTP holding the lock, so while 
it's not finished, it will keep the exclusive lock ( exactly that [~Apache9] 
mentioned in the last comment). But that would lead to the concurrent execution 
problem that I've addressed in the MasterProcedureScheduler changes. So the 
patch fixes the case when the table is healthy and all regions are online and 2 
concurrent MTPs are executed. 

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.1
>
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-05 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-3-branch-2.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-05 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-2-branch-2.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-2-branch-2.patch, HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-04 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Status: Patch Available  (was: Open)

Changes that I've made:
1. Made ModifyTableProcedure holding a lock. That would prevent a concurrent 
execution of ModifyTable procedures 
2. Added an additional condition when we may remove the queue (TableQueue in 
this case) from the fairqueue. We would remove it only the next procedure not 
only has a different parent but also doesn't belongs to the current lock. We 
need that to prevent the race condition that may happen during complex 
procedure execution that has a chain of subprocedures. 

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-04 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-1-branch-2.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-1-branch-2.patch, 
> HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-04 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501001#comment-16501001
 ] 

Sergey Soldatov commented on HBASE-20657:
-

[~elserj] thank you for pointing out. I've checked HBASE-20634 already 
([~an...@apache.org] mentioned it offline). Those bugs are unrelated and 
failure is reproducible on top of the branch-2.  

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-01 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Attachment: HBASE-20657-testcase-branch2.patch

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-06-01 Thread Sergey Soldatov (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498727#comment-16498727
 ] 

Sergey Soldatov commented on HBASE-20657:
-

Let's consider the case that [~an...@apache.org] found. Attaching the test case 
to reproduce the problem.
What happens:
1. First ModifyTableProcedure (MTP) gets the exclusive lock, performs all meta 
modifications, creates subprocedures to reopen all regions. All those 
subprocedures are going to TableQueue. After that, it releases the lock. 
2. Second MTP gets the exclusive lock, performs all meta modifications and 
tries to create subprocedures. Since during the reopen we create 
MoveRegionProcedure, we immediately fail in the constructor of 
MoveRegionProcedure, because of some regions may be closed at that moment 
because of the first MTP. 
3. Since the lock was not released in (2), the scheduler will continuously try 
to execute the second MTP, failing over and over again. 
[~stack]  that's the problem I've tried to describe in the discussion in 
HBASE-20202. There are several directions (not ways, because none of them 
actually work) to solve it:
1. remove the call of openRegion in MoveRegionProcedure constructor. Still will 
have a problem that during the execution of subprocedures of the first MTP, 
metadata will be updated with new values from the second MTP  and would start 
failing during the open regions (like in the example - the report will be that 
compression is incorrect).
2. make MTP holding a lock. Doesn't work as well. Even for a single MTP call, 
it gets stuck if there is more than one worker for proc-v2. Reason - 
MasterProcedureScheduler#doPoll :
https://github.com/apache/hbase/blob/74ef118e9e2246c09280ebb7eb6552ef91bdd094/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L200
Since MTP creates 3 levels of subprocedures, due the concurrent execution, the 
queue may have a mix of subprocedures that have different parents and all 
workers may remove that table queue from the fairqueue at the same time, so 
there will be nothing to execute for them and some regions would get stuck in 
RiT state. Not sure, but there is a chance that this is a side effect of 
HBASE-2, so FYI [~Apache9] 

Another question is related to (1) from the description. Is it expected, that 
during the retry from client we generate a new nonce key for the same 
procedure? 

Any thoughts, suggestions or directions are very appreciated. 



Also FYI [~an...@apache.org], [~elserj]

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-20657-testcase-branch2.patch
>
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-05-31 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov reassigned HBASE-20657:
---

Assignee: Sergey Soldatov

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may stuck

2018-05-29 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20657:
---

 Summary: Retrying RPC call for ModifyTableProcedure may stuck
 Key: HBASE-20657
 URL: https://issues.apache.org/jira/browse/HBASE-20657
 Project: HBase
  Issue Type: Bug
  Components: Client, proc-v2
Affects Versions: 2.0.0
Reporter: Sergey Soldatov


Env: 2 masters, 1 RS. 
Steps to reproduce: Active master is killed while ModifyTableProcedure is 
executed. 
If the table has enough regions it may come that when the secondary master get 
active some of the regions may be closed, so once client retries the call to 
the new active master, a new ModifyTableProcedure is created and get stuck 
during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
1. When we are retrying from client side, we call modifyTableAsync which create 
a procedure with a new nonce key:
{noformat}
 ModifyTableRequest request = RequestConverter.buildModifyTableRequest(
td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
{noformat}
 So on the server side, it's considered as a new procedure and starts executing 
immediately.
2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
MoveRegionProcedure for each region, but it checks whether the region is online 
(and it's not), so it fails immediately, forcing the procedure to restart.

[~an...@apache.org] saw a similar case when two concurrent ModifyTable 
procedures were running and got stuck in the similar way. 





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck

2018-05-29 Thread Sergey Soldatov (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-20657:

Summary: Retrying RPC call for ModifyTableProcedure may get stuck  (was: 
Retrying RPC call for ModifyTableProcedure may stuck)

> Retrying RPC call for ModifyTableProcedure may get stuck
> 
>
> Key: HBASE-20657
> URL: https://issues.apache.org/jira/browse/HBASE-20657
> Project: HBase
>  Issue Type: Bug
>  Components: Client, proc-v2
>Affects Versions: 2.0.0
>Reporter: Sergey Soldatov
>Priority: Major
>
> Env: 2 masters, 1 RS. 
> Steps to reproduce: Active master is killed while ModifyTableProcedure is 
> executed. 
> If the table has enough regions it may come that when the secondary master 
> get active some of the regions may be closed, so once client retries the call 
> to the new active master, a new ModifyTableProcedure is created and get stuck 
> during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because:
> 1. When we are retrying from client side, we call modifyTableAsync which 
> create a procedure with a new nonce key:
> {noformat}
>  ModifyTableRequest request = 
> RequestConverter.buildModifyTableRequest(
> td.getTableName(), td, ng.getNonceGroup(), ng.newNonce());
> {noformat}
>  So on the server side, it's considered as a new procedure and starts 
> executing immediately.
> 2. When we are processing  MODIFY_TABLE_REOPEN_ALL_REGIONS we create 
> MoveRegionProcedure for each region, but it checks whether the region is 
> online (and it's not), so it fails immediately, forcing the procedure to 
> restart.
> [~an...@apache.org] saw a similar case when two concurrent ModifyTable 
> procedures were running and got stuck in the similar way. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20621) Unclear error for deleting from not existing table.

2018-05-22 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-20621:
---

 Summary: Unclear error for deleting from not existing table. 
 Key: HBASE-20621
 URL: https://issues.apache.org/jira/browse/HBASE-20621
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0
Reporter: Sergey Soldatov


When I try to delete a row from a not existing table, the error is quite 
confusing. Instead of getting a table not found exception I got 
{noformat}
ERROR [main] client.AsyncRequestFutureImpl: Cannot get replica 0 location for 
{"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807}

ERROR: Failed 1 action: t1: 1 time, servers with issues: null
{noformat}
That happens, because delete is using AsyncRequestFuture which wraps all region 
location errors into 'Cannot get replica' error.  I expect that others actions 
like batch, mutateRow, checkAndDelete behave in the same way. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20202) [AMv2] Don't move region if its a split parent or offlined

2018-05-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478340#comment-16478340
 ] 

Sergey Soldatov commented on HBASE-20202:
-

bq. On the the new master, it will notice this and re-run the step that was 
doing the close step... That is what should happen sir.
Correct. It happens almost in that way. But in my case it's a bit more 
complicated :
We have modified a table (chaos monkey changed bloom filter) and when 
ModifyTableProcedure was executed, and master was killed during execution of 
UnassignProcedure when one of the regions was closing 
New master had the following list of not successful procedures during the 
startup:
UnassignProcedure pid=766, ppid=754
MoveRegionProcedure pid=754, ppid=749
ModifyTableProcedure pid=749
Once master got online, UnassignProcedure started:
{noformat}
2018-05-15 21:48:23,787 INFO  [PEWorker-8] assignment.RegionStateStore: pid=766 
updating hbase:meta row=92e0d39ee7e6d19566c393bae58ab5c0, regionState=CLOSING
2018-05-15 21:48:23,820 INFO  [PEWorker-8] 
assignment.RegionTransitionProcedure: Dispatch pid=766, ppid=754, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
table=IntegrationTestBigLinkedList, region=92e0d39ee7e6d19566c393bae58ab5c0, 
server=ctr-e138-15181
{noformat}
Not sure what exactly triggered that (any retry logic from the client? I don't 
see any logs from the chaos monkey for that action), but suddenly master is 
trying to execute an another ModifyTableProcedure. And it started from the 
state 
MODIFY_TABLE_REOPEN_ALL_REGIONS which tries to reopen all regions using 
MoveRegionProcedure. And because our region is not online, it fails due that 
checkOnline:
{noformat}
2018-05-15 21:48:29,684 INFO  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] bzip2.Bzip2Factory: 
Successfully loaded & initialized native-bzip2 library system-native
2018-05-15 21:48:29,685 INFO  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] compress.CodecPool: 
Got brand-new compressor [.bz2]
2018-05-15 21:48:29,744 INFO  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] compress.CodecPool: 
Got brand-new compressor [.lz4]
2018-05-15 21:48:29,788 INFO  
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] master.HMaster: 
Client=hbase//172.27.86.70 modify IntegrationTestBigLinkedList
2018-05-15 21:48:29,902 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] 
procedure2.ProcedureExecutor: Stored pid=790, 
state=RUNNABLE:MODIFY_TABLE_PREPARE; ModifyTableProcedure 
table=IntegrationTestBigLinkedList
2018-05-15 21:48:29,969 DEBUG [PEWorker-4] util.FSTableDescriptors: Wrote into 
hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.06
2018-05-15 21:48:29,972 DEBUG [PEWorker-4] util.FSTableDescriptors: Deleted 
hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.05
2018-05-15 21:48:29,972 INFO  [PEWorker-4] util.FSTableDescriptors: Updated 
tableinfo=hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.06
2018-05-15 21:48:30,004 WARN  [PEWorker-4] procedure.ModifyTableProcedure: 
Retriable error trying to modify table=IntegrationTestBigLinkedList (in 
state=MODIFY_TABLE_REOPEN_ALL_REGIONS)
org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
92e0d39ee7e6d19566c393bae58ab5c0 is not OPEN
at 
org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193)
at 
org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705)
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128)
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50)
at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760)
2018-05-15 21:48:30,027 WARN  [PEWorker-4] procedure.ModifyTableProcedure: 
Retriable error trying to modify 

[jira] [Commented] (HBASE-20202) [AMv2] Don't move region if its a split parent or offlined

2018-05-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477967#comment-16477967
 ] 

Sergey Soldatov commented on HBASE-20202:
-

[~stack] In your fix you have added a check that a region is open to the 
MoveRegionProcedure constructor:
{noformat}
  public MoveRegionProcedure(final MasterProcedureEnv env, final RegionPlan 
plan)
  throws HBaseIOException {
super(env, plan.getRegionInfo());
this.plan = plan;
preflightChecks(env, true);
checkOnline(env, plan.getRegionInfo());
  }
{noformat}
 What will happen if master crashed during UnassignProcedure execution and 
region went to close state? Recently we had a case when new master was trying 
to recover from the crash and got stuck because the region was in CLOSED state 
and this check prevented to create the procedure. 
FYI [~elserj] 

> [AMv2] Don't move region if its a split parent or offlined
> --
>
> Key: HBASE-20202
> URL: https://issues.apache.org/jira/browse/HBASE-20202
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2
>Affects Versions: 2.0.0-beta-2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20202.branch-2.001.patch, 
> HBASE-20202.branch-2.002.patch, HBASE-20202.branch-2.003.patch, 
> HBASE-20202.branch-2.003.patch
>
>
> Found this one running ITBLLs. We'd just finished splitting a region 
> 91655de06786f786b0ee9c51280e1ee6 and then a move for it comes in. The move 
> fails in an interesting way. The location has been removed from the 
> regionnode kept by the Master. HBASE-20178 adds macro checks on context. Need 
> to add a few checks to the likes of MoveRegionProcedure so we don't try to 
> move an offlined/split parent.
> {code}
> 2018-03-14 10:21:45,678 INFO  [PEWorker-2] procedure2.ProcedureExecutor: 
> Finished pid=3177, state=SUCCESS; SplitTableRegionProcedure 
> table=IntegrationTestBigLinkedList, parent=91655de06786f786b0ee9c51280e1ee6, 
> daughterA=b67bf6b79eaa83de788b0519f782ce8e, 
> daughterB=99cf6ddb38cad08e3aa7635b6cac2e7b in 10.0210sec   
> 2018-03-14 10:21:45,679 INFO  [PEWorker-15] 
> procedure.MasterProcedureScheduler: pid=3194, ppid=3193, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, 
> server=ve0530.halxg.cloudera.com,16020,1521007509855, 
> IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0.
> 2018-03-14 10:21:45,680 INFO  [PEWorker-5] 
> procedure.MasterProcedureScheduler: pid=3187, 
> state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure 
> hri=IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6.,
>  source=ve0530.halxg.cloudera.com,16020,1521007509855, 
> destination=ve0528.halxg.cloudera.com,16020,1521047890874, 
> IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6.
> 2018-03-14 10:21:45,680 INFO  [PEWorker-15] assignment.RegionStateStore: 
> pid=3194 updating hbase:meta 
> row=IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0.,
>  regionState=CLOSING
> 2018-03-14 10:21:45,680 INFO  [PEWorker-5] procedure2.ProcedureExecutor: 
> Initialized subprocedures=[{pid=3195, ppid=3187, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, 
> server=ve0530.halxg.cloudera.com,16020,1521007509855}]
> 2018-03-14 10:21:45,683 INFO  [PEWorker-15] 
> assignment.RegionTransitionProcedure: Dispatch pid=3194, ppid=3193, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, 
> server=ve0530.halxg.cloudera.com,16020,1521007509855; rit=CLOSING, 
> location=ve0530.halxg.cloudera.com,16020,1521007509855
> 2018-03-14 10:21:45,752 INFO  [PEWorker-15] 
> procedure.MasterProcedureScheduler: pid=3195, ppid=3187, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, 
> server=ve0530.halxg.cloudera.com,16020,1521007509855, 
> IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6.
> 2018-03-14 10:21:45,753 ERROR [PEWorker-15] procedure2.ProcedureExecutor: 
> CODE-BUG: Uncaught runtime exception: pid=3195, ppid=3187, 
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure 
> table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, 
> server=ve0530.halxg.cloudera.com,16020,1521007509855
> java.lang.NullPointerException

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-28 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380788#comment-16380788
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~elserj] well, my changes are quite simple. 
Was: a method that had *hcd.setBloomFilterType(BloomType.NONE);*
Now: a new method that has *hcd.setBloomFilterType(type);* with an additional 
parameter for bloom type and *this* method we are calling from our test. And 
the old method is just calling the new one with BloomType.NONE as the 
parameter. So, there are no changes in the behavior. 

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, 
> HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-28 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380728#comment-16380728
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~ram_krish] actually we were setting it to none by default :
{noformat}
-  // Disable blooms (they are on by default as of 0.95) but we disable 
them here because
-  // tests have hard coded counts of what to expect in block cache, etc., 
and blooms being
-  // on is interfering.
-  builder.addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(family)
-  .setBloomFilterType(BloomType.NONE)
-  .build());
{noformat}
That's the part of the code that this method had previously.  

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, 
> HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-26 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v5-branch-1.4.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, 
> HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-26 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v5-branch-1.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.patch, 
> HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-26 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377500#comment-16377500
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~yuzhih...@gmail.com] addressed your comments in v5. 
[~chia7712] Sure will do it today. 

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-26 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v5-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-23 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v4-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-23 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: (was: HBASE-19863.v4-branch-2.patch)

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>  

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-23 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375057#comment-16375057
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~yuzhih...@gmail.com] Ough. Didn't notice that. Reattached.

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-23 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375017#comment-16375017
 ] 

Sergey Soldatov commented on HBASE-19863:
-

That's exactly what I have in the patch

{noformat}
+@Category({ RegionServerTests.class, FilterTests.class, MediumTests.class })
+public class TestIsDeleteFailure {
+  private static final HBaseTestingUtility TEST_UTIL = new 
HBaseTestingUtility();
+
+  @ClassRule
+  public static final HBaseClassTestRule CLASS_RULE =
+  HBaseClassTestRule.forClass(TestIsDeleteFailure.class);
+
{noformat}
Both patches for branch-2 and master are the same, but this failure happen only 
on branch-2 and not reproducible locally. 

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-23 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374898#comment-16374898
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~yuzhih...@gmail.com] that looks like a flaky test and failure is not related 
to the patch. Passed locally and as we can see passed for branch-2 precommit 
test. I more worry about the failure for branch-2 patch. Don't understand where 
that assert error comes from
{noformat}
java.lang.AssertionError: No HBaseClassTestRule ClassRule for 
org.apache.hadoop.hbase.regionserver.TestIsDeleteFailure
{noformat}


> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-21 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v4-master.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, 
> HBASE-19863.v4-master.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-21 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v4-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = 

[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-02-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367941#comment-16367941
 ] 

Sergey Soldatov commented on HBASE-19805:
-

[~stack] Sorry, missed the previous notification. Last time I ended up with 
some weird scenarios where everything get stuck without any visible problems 
and didn't have a chance to dig it due $dayjob. Will try to get back to this 
stuff during the weekend. 

> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while (numRegions == newRegionSize) {
>   Thread.sleep(50);
>   newRegionSize = admin.getRegions(tn).size();
> }
>   }
> {code}
> A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame.
> {code}
>   static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv 
> env,
>   final ServerName regionLocation, final RegionInfo hri, boolean 
> includeBestSplitRow)
>   throws IOException {
> // TODO: There is no timeout on this controller. Set one!
> HBaseRpcController controller = 
> env.getMasterServices().getClusterConnection().
> 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366693#comment-16366693
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~chia7712]  I've moved the test to a separate file.  I don't think it affects 
trySkipToNextRow. This problem happens inside a single row only, so even if we 
jump back when getting a new top, we will just iterate using next() to the next 
row anyway... As for changing realSeekDone to false, I was thinking that it was 
done especially to avoid some seeks (not only the case we are discussing). So 
that kind of changes may cause a performance degradation for some cases?

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-15 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v3-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, 
> HBASE-19863.v3-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = getNextIndexedKey();
>   if (nextIndexedKey != 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-14 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363718#comment-16363718
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~yuzhih...@gmail.com] you mean checkstyle? Addressed that in v2 patch.

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-14 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863.v2-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = getNextIndexedKey();
>   if (nextIndexedKey != null && nextIndexedKey != 
> 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-13 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863-branch-2.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, 
> HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = getNextIndexedKey();
>   if (nextIndexedKey != null && nextIndexedKey != 
> KeyValueScanner.NO_NEXT_INDEXED_KEY
>  

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-12 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361916#comment-16361916
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~ram_krish] thank you, sir! I will prepare the final patch tomorrow (it would 
require some changes in test harness to make bloom filter configurable).  

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-06 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355021#comment-16355021
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~anoop.hbase] thank you, sir!  The question is whether it's ok to do seek if 
during the heap.next() we choose another scanner which is actually 'lag' behind 
(for example cur points to C10, but hfs is still pointing to C1). If we try to 
skip here we may break the order (so, at the moment we are at C10, but during 
the next instead of C11 we will get C2. I tried to find a better place to fix 
it but didn't find any. 

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-02-05 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352674#comment-16352674
 ] 

Sergey Soldatov commented on HBASE-19863:
-

 bq. I agree to this. But my arg would be that if getNextIndexedKey() is a key 
with the next row 
It's usually a key with the next row. And in this case, the (1) check just 
return the value greater than zero, so we will perform next() once. After that, 
in (2) we just check that we are still in the same column and if not we return 
true, avoiding the call of reseek.  
bq. Expectation is for compareKeyForNextColumn() check with nextcolumn.
According to the method name it's true, but according to the comments this is 
just a "result of the compare between the indexed key and the key portion of 
the passed cell". I believe that all this logic fits [~lhofhansl] optimization 
for SEEK vs SKIP that is implemented in HBASE-13109

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will 

[jira] [Comment Edited] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-30 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345868#comment-16345868
 ] 

Sergey Soldatov edited comment on HBASE-19863 at 1/30/18 10:07 PM:
---

[~ram_krish] That's an interesting question what trySkipToNextColumn is 
supposed to do. If I understand correctly it's just trying to really skip to 
the next column in the row without reseek. There are two checks:
1.
{noformat}
matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0
{noformat}
So the reason for this check is only to check that we are still in the same 
block, so no need for reseek
2.
{noformat}
CellUtil.matchingRowColumn(cell, nextCell)
{noformat}
That would work when we iterate through versions of the same cell. 
Based on those two conditions we decide whether we need to reseek (at the 
moment only if we reached the block end or there are no indexed keys at all), 
or just continue to iterate in StoreScanner.next with next column. So, my fix 
just handles the case when heap.next() change the top scanner which lags 
behind, so the second check thinks that we on the next cell, but actually we 
just jumped back and need to perform reseek.
And yes, the original description is just a corner case of the wider problem 
when we change our top scanner to the one which skipped last reseek due to 
optimization with fake KVs. 



was (Author: sergey.soldatov):
[~ram_krish] That's an interesting question what trySkipToNextColumn is 
supposed to do. If I understand correctly it's just tries to really skip to the 
next column in the row without reseek. There are two checks:
1.
{noformat}
matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0
{noformat}
So the reason of this check is only to check that we are still in the same 
block, so no need for reseek
2.
{noformat}
CellUtil.matchingRowColumn(cell, nextCell)
{noformat}
That would work when we iterate through versions of the same cell. 
Based on those two conditions we decide whether we need to reseek (at the 
moment only if we reached the block end or there is no indexed keys at all), or 
just continue to iterate in StoreScanner.next with new cell. So, my fix is just 
handle the case when heap.next() change the top scanner which lags behind, so 
second check thinks that we on the next cell, but actually we just jumped back 
and need to perform reseek.
And yes, the original description is just a corner case of the wider problem 
when we change our top scanner to the one which skipped last reseek due 
optimization with fake KVs. 


> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-30 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345868#comment-16345868
 ] 

Sergey Soldatov commented on HBASE-19863:
-

[~ram_krish] That's an interesting question what trySkipToNextColumn is 
supposed to do. If I understand correctly it's just tries to really skip to the 
next column in the row without reseek. There are two checks:
1.
{noformat}
matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0
{noformat}
So the reason of this check is only to check that we are still in the same 
block, so no need for reseek
2.
{noformat}
CellUtil.matchingRowColumn(cell, nextCell)
{noformat}
That would work when we iterate through versions of the same cell. 
Based on those two conditions we decide whether we need to reseek (at the 
moment only if we reached the block end or there is no indexed keys at all), or 
just continue to iterate in StoreScanner.next with new cell. So, my fix is just 
handle the case when heap.next() change the top scanner which lags behind, so 
second check thinks that we on the next cell, but actually we just jumped back 
and need to perform reseek.
And yes, the original description is just a corner case of the wider problem 
when we change our top scanner to the one which skipped last reseek due 
optimization with fake KVs. 


> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-29 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Status: Patch Available  (was: Open)

A WIP patch to deal with this issue. If during the attempt of skip to next 
column we hit the case when store scanner was changed and new head is behind 
the cell we are looking for, we just return false to force reseek in 
seekOrSkipToNextColumn. Not sure yet whether it covers all possible scenarios.

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> 

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-29 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863-branch1.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = getNextIndexedKey();
>   if (nextIndexedKey != null && nextIndexedKey != 
> KeyValueScanner.NO_NEXT_INDEXED_KEY
>   && 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-29 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343845#comment-16343845
 ] 

Sergey Soldatov commented on HBASE-19863:
-

bq. In this place we try to do a real seek right before actually using that 
scanner? I remember this to happen if we are in a fake key.

There are some exceptions. StoreScanner.trySkipToNextColumn as an example. If 
we have indexed keys, we just iterate through the heap without reseek (which 
may happen in

seekOrSkipToNextColumn if trySkipToNextColumn return false).  

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 

[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-29 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343811#comment-16343811
 ] 

Sergey Soldatov commented on HBASE-19863:
-

Attached a test case to reproduce the problem for branch-1. It has some changes 
in HBaseTestingUtility.createTable to make required conditions  (bloom filter 
as well as block size). Please let me know whether it helps to reproduce the 
problem.

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {

[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-29 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19863:

Attachment: HBASE-19863-test.patch

> java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter 
> is used
> -
>
> Key: HBASE-19863
> URL: https://issues.apache.org/jira/browse/HBASE-19863
> Project: HBase
>  Issue Type: Bug
>  Components: Filters
>Affects Versions: 1.4.1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Attachments: HBASE-19863-test.patch
>
>
> Under some circumstances scan with SingleColumnValueFilter may fail with an 
> exception
> {noformat} 
> java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
> qualifier=C2, timestamp=1516433595543, comparison result: 1 
> at 
> org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
>   at 
> org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
>   at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
> {noformat}
> Conditions:
> table T with a single column family 0 that uses ROWCOL bloom filter 
> (important)  and column qualifiers C1,C2,C3,C4,C5. 
> When we fill the table for every row we put deleted cell for C3.
> The table has a single region with two HStore:
> A: start row: 0, stop row: 99 
> B: start row: 10 stop row: 99
> B has newer versions of rows 10-99. Store files have several blocks each 
> (important). 
> Store A is the result of major compaction,  so it doesn't have any deleted 
> cells (important).
> So, we are running a scan like:
> {noformat}
> scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
> ('0','C5',=,'binary:whatever')"}
> {noformat}  
> How the scan performs:
> First, we iterate A for rows 0 and 1 without any problems. 
> Next, we start to iterate A for row 10, so read the first cell and set hfs 
> scanner to A :
> 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
> 10:0/C1/1/Put/x, 
> so we make B as our current store scanner. Since we are looking for 
> particular columns 
> C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
> which 
> would run reseek for all store scanners.
> For store A the following magic would happen in requestSeek:
>   1. bloom filter check passesGeneralBloomFilter would set haveToSeek to 
> false because row 10 doesn't have C3 qualifier in store A.  
>   2. Since we don't have to seek we just create a fake row 
> 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
> us and it commented with :
> {noformat}
>  // Multi-column Bloom filter optimization.
> // Create a fake key/value, so that this scanner only bubbles up to the 
> top
> // of the KeyValueHeap in StoreScanner after we scanned this row/column in
> // all other store files. The query matcher will then just skip this fake
> // key/value and the store scanner will progress to the next column. This
> // is obviously not a "real real" seek, but unlike the fake KV earlier in
> // this method, we want this to be propagated to ScanQueryMatcher.
> {noformat}
> 
> For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum 
> to skip C3 entirely. 
> After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
> which run first trySkipToNextColumn:
> {noformat}
>   protected boolean trySkipToNextColumn(Cell cell) throws IOException {
> Cell nextCell = null;
> do {
>   Cell nextIndexedKey = getNextIndexedKey();
>   if (nextIndexedKey != null && nextIndexedKey != 
> KeyValueScanner.NO_NEXT_INDEXED_KEY
>   && matcher.compareKeyForNextColumn(nextIndexedKey, 

[jira] [Created] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used

2018-01-25 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-19863:
---

 Summary: java.lang.IllegalStateException: isDelete failed when 
SingleColumnValueFilter is used
 Key: HBASE-19863
 URL: https://issues.apache.org/jira/browse/HBASE-19863
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 1.4.1
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov


Under some circumstances scan with SingleColumnValueFilter may fail with an 
exception
{noformat} 
java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, 
qualifier=C2, timestamp=1516433595543, comparison result: 1 
at 
org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149)
at 
org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
{noformat}
Conditions:
table T with a single column family 0 that uses ROWCOL bloom filter (important) 
 and column qualifiers C1,C2,C3,C4,C5. 
When we fill the table for every row we put deleted cell for C3.
The table has a single region with two HStore:
A: start row: 0, stop row: 99 
B: start row: 10 stop row: 99
B has newer versions of rows 10-99. Store files have several blocks each 
(important). 
Store A is the result of major compaction,  so it doesn't have any deleted 
cells (important).
So, we are running a scan like:
{noformat}
scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter 
('0','C5',=,'binary:whatever')"}
{noformat}  
How the scan performs:
First, we iterate A for rows 0 and 1 without any problems. 
Next, we start to iterate A for row 10, so read the first cell and set hfs 
scanner to A :
10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 
10:0/C1/1/Put/x, 
so we make B as our current store scanner. Since we are looking for particular 
columns 
C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn 
which 
would run reseek for all store scanners.
For store A the following magic would happen in requestSeek:
  1. bloom filter check passesGeneralBloomFilter would set haveToSeek to false 
because row 10 doesn't have C3 qualifier in store A.  
  2. Since we don't have to seek we just create a fake row 
10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for 
us and it commented with :
{noformat}
 // Multi-column Bloom filter optimization.
// Create a fake key/value, so that this scanner only bubbles up to the top
// of the KeyValueHeap in StoreScanner after we scanned this row/column in
// all other store files. The query matcher will then just skip this fake
// key/value and the store scanner will progress to the next column. This
// is obviously not a "real real" seek, but unlike the fake KV earlier in
// this method, we want this to be propagated to ScanQueryMatcher.
{noformat}

For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum to 
skip C3 entirely. 
After that we start searching for qualifier C5 using seekOrSkipToNextColumn 
which run first trySkipToNextColumn:
{noformat}
  protected boolean trySkipToNextColumn(Cell cell) throws IOException {
Cell nextCell = null;
do {
  Cell nextIndexedKey = getNextIndexedKey();
  if (nextIndexedKey != null && nextIndexedKey != 
KeyValueScanner.NO_NEXT_INDEXED_KEY
  && matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0) {
this.heap.next();
++kvsScanned;
  } else {
return false;
  }
} while ((nextCell = this.heap.peek()) != null && 
CellUtil.matchingRowColumn(cell, nextCell));
return true;
  }
{noformat}
If store has several blocks than nextIndexedKey would be not null and 
compareKeyForNextColumn wouldn't be negative, 
so we try to search forward until we index or end of the row. But in 
this.heap.next(), the scanner for A bubbles 

[jira] [Commented] (HBASE-19774) incorrect behavior of locateRegionInMeta

2018-01-18 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331762#comment-16331762
 ] 

Sergey Soldatov commented on HBASE-19774:
-

[~mdrob] That's the interesting part. This incorrect behavior exists for many 
releases already. As I already mentioned earlier the exception would be still 
TNFE but with a missleading message. In 1.x "A not found, got B". In 2.x that 
would be very scary message about corrupted meta. And actually, that was 
revealed because of HBASE-19775 when shell received TNFE wrapped by 
UncheckedIOException, so it was unable to properly handle it. 

> incorrect behavior of locateRegionInMeta
> 
>
> Key: HBASE-19774
> URL: https://issues.apache.org/jira/browse/HBASE-19774
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Romil Choksi
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19774-branch-2.patch, 
> HBASE-19774-wip-branch-2.patch, HBASE-19774.master.patch
>
>
> When we try to operate with not existing table, in some circumstances we get 
> an incorrect report about the not existing table:
> {noformat}
> ERROR: Region of 
> 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is 
> expected in the table of 'nonExistentUsertable', but hbase:meta says it is in 
> the table of 'hbase:namespace'. hbase:meta might be damaged.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19774) incorrect behavior of locateRegionInMeta

2018-01-18 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19774:

Attachment: HBASE-19774.master.patch

> incorrect behavior of locateRegionInMeta
> 
>
> Key: HBASE-19774
> URL: https://issues.apache.org/jira/browse/HBASE-19774
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Romil Choksi
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19774-branch-2.patch, 
> HBASE-19774-wip-branch-2.patch, HBASE-19774.master.patch
>
>
> When we try to operate with not existing table, in some circumstances we get 
> an incorrect report about the not existing table:
> {noformat}
> ERROR: Region of 
> 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is 
> expected in the table of 'nonExistentUsertable', but hbase:meta says it is in 
> the table of 'hbase:namespace'. hbase:meta might be damaged.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19774) incorrect behavior of locateRegionInMeta

2018-01-18 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331558#comment-16331558
 ] 

Sergey Soldatov commented on HBASE-19774:
-

Attached the patch with a test that covers different table names.

> incorrect behavior of locateRegionInMeta
> 
>
> Key: HBASE-19774
> URL: https://issues.apache.org/jira/browse/HBASE-19774
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Romil Choksi
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19774-branch-2.patch, 
> HBASE-19774-wip-branch-2.patch
>
>
> When we try to operate with not existing table, in some circumstances we get 
> an incorrect report about the not existing table:
> {noformat}
> ERROR: Region of 
> 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is 
> expected in the table of 'nonExistentUsertable', but hbase:meta says it is in 
> the table of 'hbase:namespace'. hbase:meta might be damaged.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-19774) incorrect behavior of locateRegionInMeta

2018-01-18 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19774:

Attachment: HBASE-19774-branch-2.patch

> incorrect behavior of locateRegionInMeta
> 
>
> Key: HBASE-19774
> URL: https://issues.apache.org/jira/browse/HBASE-19774
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0-beta-1
>Reporter: Romil Choksi
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19774-branch-2.patch, 
> HBASE-19774-wip-branch-2.patch
>
>
> When we try to operate with not existing table, in some circumstances we get 
> an incorrect report about the not existing table:
> {noformat}
> ERROR: Region of 
> 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is 
> expected in the table of 'nonExistentUsertable', but hbase:meta says it is in 
> the table of 'hbase:namespace'. hbase:meta might be damaged.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-01-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328342#comment-16328342
 ] 

Sergey Soldatov commented on HBASE-19805:
-

an update. An additional check for empty region location as well as for CLOSING 
state in checkSplittable method helps to avoid multiple splits for the same 
region as well as it solves this exception. But now I see that it may to get 
stuck somewhere during the split execution, so I'm trying to find the reason 
and fix it before submitting a patch.

> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while (numRegions == newRegionSize) {
>   Thread.sleep(50);
>   newRegionSize = admin.getRegions(tn).size();
> }
>   }
> {code}
> A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame.
> {code}
>   static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv 
> env,
>   final ServerName regionLocation, final RegionInfo hri, boolean 
> includeBestSplitRow)
>   throws IOException {
> // TODO: There is no timeout on this controller. Set one!
> 

[jira] [Comment Edited] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-01-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327990#comment-16327990
 ] 

Sergey Soldatov edited comment on HBASE-19805 at 1/16/18 11:27 PM:
---

well, here is a short RCA:
checkSplittable() method relies on Region.isSplittable which is just a simple 
check that region is available (not closing nor not closed) and has no 
references. But HRegion.closing flag we set only when we actually execute 
doClose().  At first glance, it would be reasonable to add a check that the 
region state (RegionStateNode) is not CLOSING to checkSplittable(). 


was (Author: sergey.soldatov):
well, here is a short RCA:
checkSplittable method relies on Region.isSplittable which is just a simple 
check that region is available (not closing nor not closed) and has no 
references. But HRegion.closing flag we set only when we actually execute 
doClose().  At first glance, it would be reasonable to add a check that the 
region state (RegionStateNode) is not CLOSING. 

> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while 

[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-01-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327990#comment-16327990
 ] 

Sergey Soldatov commented on HBASE-19805:
-

well, here is a short RCA:
checkSplittable method relies on Region.isSplittable which is just a simple 
check that region is available (not closing nor not closed) and has no 
references. But HRegion.closing flag we set only when we actually execute 
doClose().  At first glance, it would be reasonable to add a check that the 
region state (RegionStateNode) is not CLOSING. 

> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while (numRegions == newRegionSize) {
>   Thread.sleep(50);
>   newRegionSize = admin.getRegions(tn).size();
> }
>   }
> {code}
> A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame.
> {code}
>   static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv 
> env,
>   final ServerName regionLocation, final RegionInfo hri, boolean 
> includeBestSplitRow)
>   throws IOException {
> // TODO: There is no timeout on this 

[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-01-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327901#comment-16327901
 ] 

Sergey Soldatov commented on HBASE-19805:
-

[~stack] yep, that's exactly what I've expected. Let me dig why it happen on my 
own. 
[~elserj] You have an error in your code:
{noformat}
  Thread.sleep(50);
  newRegionSize = admin.getRegions(tn).size();
{noformat}
It should be numRegions, not newRegionSize.  Otherwise this check is valid for 
the first split only and that's why it run splits for the same region without 
stop. 

> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while (numRegions == newRegionSize) {
>   Thread.sleep(50);
>   newRegionSize = admin.getRegions(tn).size();
> }
>   }
> {code}
> A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame.
> {code}
>   static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv 
> env,
>   final ServerName regionLocation, final RegionInfo hri, boolean 
> includeBestSplitRow)
>   throws IOException {
> // TODO: There 

[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits

2018-01-16 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327873#comment-16327873
 ] 

Sergey Soldatov commented on HBASE-19805:
-

[~stack] almost. Actually, that happens when one split is already in progress. 
We are closing parent (so, we are setting location to null) and if at this time 
we try to check whether this region is splittable, we hit this problem. I'm not 
sure yet why we allow to split the same region many times. From my log 
scheduled splits:
{noformat}
parent=af7ddfb3943627b825ddfe3fedb27590, 
daughterA=fde4b311dd76909e05cd57f2d19a8ebc, 
daughterB=4381a4dd9da46c6e7ce91ab6419fb708 
parent=af7ddfb3943627b825ddfe3fedb27590, 
daughterA=38d5fffbe693017be0d2fcc97eec3e3e, 
daughterB=e76bfc1dddfca511c00dfd3477dc003d 
parent=af7ddfb3943627b825ddfe3fedb27590, 
daughterA=ef89483bc4117a31536e1c25def4f64e, 
daughterB=ccf1c2f5f43f478af438b6c3f2ca7ef5 
{noformat}
And only the first one is actually happen. 


> NPE in HMaster while issuing a sequence of table splits
> ---
>
> Key: HBASE-19805
> URL: https://issues.apache.org/jira/browse/HBASE-19805
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.0.0-beta-1
>Reporter: Josh Elser
>Assignee: Sergey Soldatov
>Priority: Critical
> Fix For: 2.0.0-beta-2
>
>
> I wrote a toy program to test the client tarball in HBASE-19735. After the 
> first few region splits, I see the following error in the Master log. 
> {noformat}
> 2018-01-16 14:07:52,797 INFO  
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: 
> Client=jelser//192.168.1.23 split 
> myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23.
> 2018-01-16 14:07:52,797 ERROR 
> [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: 
> Unexpected throwable object
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229)
>   at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175)
>   at 
> org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149)
>   at 
> org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146)
>   at 
> org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103)
>   at 
> org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761)
>   at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626)
>   at 
> org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
>   at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778)
>   at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {noformat}
> {code}
>   public static void main(String[] args) throws Exception {
> Configuration conf = HBaseConfiguration.create();
> try (Connection conn = ConnectionFactory.createConnection(conf);
> Admin admin = conn.getAdmin()) {
>   final TableName tn = TableName.valueOf("myTestTable");
>   if (admin.tableExists(tn)) {
> admin.disableTable(tn);
> admin.deleteTable(tn);
>   }
>   final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn)
>   
> .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build())
>   .build();
>   admin.createTable(desc);
>   List splitPoints = new ArrayList<>(16);
>   for (int i = 1; i <= 16; i++) {
> splitPoints.add(Integer.toString(i, 16));
>   }
>   
>   System.out.println("Splits: " + splitPoints);
>   int numRegions = admin.getRegions(tn).size();
>   for (String splitPoint : splitPoints) {
> System.out.println("Splitting on " + splitPoint);
> admin.split(tn, Bytes.toBytes(splitPoint));
> Thread.sleep(200);
> int newRegionSize = admin.getRegions(tn).size();
> while (numRegions == newRegionSize) {
>   Thread.sleep(50);
>   

[jira] [Updated] (HBASE-19775) hbase shell doesn't handle the exceptions that are wrapped in java.io.UncheckedIOException

2018-01-11 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated HBASE-19775:

Attachment: HBASE-19775-2-branch-2.patch

The updated version of the patch.  Moved getCause before individual command 
handler. As for the UT - honestly speaking I'm not sure how to check that 
upper-level shell commands handled exception correctly. 

> hbase shell doesn't handle the exceptions that are wrapped in 
> java.io.UncheckedIOException
> --
>
> Key: HBASE-19775
> URL: https://issues.apache.org/jira/browse/HBASE-19775
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Affects Versions: 2.0.0-beta-1
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
> Fix For: 2.0.0-beta-2
>
> Attachments: HBASE-19775-2-branch-2.patch, HBASE-19775-branch-2.patch
>
>
> HBase shell doesn't have a notion of UncheckedIOException, so it may not 
> handle it correctly. For an example, if we scan not existing table the error 
> look weird:
> {noformat}
> hbase(main):001:0> scan 'a'
> ROW   
>   COLUMN+CELL
> ERROR: a
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   >