[jira] [Commented] (HBASE-28055) Performance improvement for scan over several stores.
[ https://issues.apache.org/jira/browse/HBASE-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764012#comment-17764012 ] Sergey Soldatov commented on HBASE-28055: - [~bbeaudreault], I've added pull requests for branch-2.4, branch-2, and branch-2.5. That should fix the build failure. Sorry about that. > Performance improvement for scan over several stores. > -- > > Key: HBASE-28055 > URL: https://issues.apache.org/jira/browse/HBASE-28055 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-4, 2.5.5 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1, 4.0.0-alpha-1 > > > During the fix of HBASE-19863, an additional check for fake cells that > trigger reseek was added. It comes that this check produces unnecessary > reseeks because > matcher.compareKeyForNextColumn should be used only with indexed keys. Later > [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a > better solution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HBASE-28055) Performance improvement for scan over several stores.
[ https://issues.apache.org/jira/browse/HBASE-28055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-28055 started by Sergey Soldatov. --- > Performance improvement for scan over several stores. > -- > > Key: HBASE-28055 > URL: https://issues.apache.org/jira/browse/HBASE-28055 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-4, 2.5.5 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > During the fix of HBASE-19863, an additional check for fake cells that > trigger reseek was added. It comes that this check produces unnecessary > reseeks because > matcher.compareKeyForNextColumn should be used only with indexed keys. Later > [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a > better solution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28055) Performance improvement for scan over several stores.
Sergey Soldatov created HBASE-28055: --- Summary: Performance improvement for scan over several stores. Key: HBASE-28055 URL: https://issues.apache.org/jira/browse/HBASE-28055 Project: HBase Issue Type: Bug Affects Versions: 2.5.5, 3.0.0-alpha-4 Reporter: Sergey Soldatov Assignee: Sergey Soldatov During the fix of HBASE-19863, an additional check for fake cells that trigger reseek was added. It comes that this check produces unnecessary reseeks because matcher.compareKeyForNextColumn should be used only with indexed keys. Later [~larsh] suggested doing a simple check for OLD_TIMESTAMP and it looks like a better solution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only
[ https://issues.apache.org/jira/browse/HBASE-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27630 started by Sergey Soldatov. --- > hbase-spark bulkload stage directory limited to hdfs only > - > > Key: HBASE-27630 > URL: https://issues.apache.org/jira/browse/HBASE-27630 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 3.0.0-alpha-3 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > It's impossible to set up the staging directory for bulkload operation in > spark-hbase connector to any other filesystem different from hdfs. That might > be a problem for deployments where hbase.rootdir points to cloud storage. In > this case, an additional copy task from hdfs to cloud storage would be > required before loading hfiles to hbase. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27630) hbase-spark bulkload stage directory limited to hdfs only
Sergey Soldatov created HBASE-27630: --- Summary: hbase-spark bulkload stage directory limited to hdfs only Key: HBASE-27630 URL: https://issues.apache.org/jira/browse/HBASE-27630 Project: HBase Issue Type: Bug Components: spark Affects Versions: 3.0.0-alpha-3 Reporter: Sergey Soldatov Assignee: Sergey Soldatov It's impossible to set up the staging directory for bulkload operation in spark-hbase connector to any other filesystem different from hdfs. That might be a problem for deployments where hbase.rootdir points to cloud storage. In this case, an additional copy task from hdfs to cloud storage would be required before loading hfiles to hbase. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.
[ https://issues.apache.org/jira/browse/HBASE-20719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov resolved HBASE-20719. - Resolution: Duplicate > HTable.batch() doesn't handle TableNotFound correctly. > -- > > Key: HBASE-20719 > URL: https://issues.apache.org/jira/browse/HBASE-20719 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Minor > > batch() as well as delete() are processing using AsyncRequest. To report > about problems we are using RetriesExhaustedWithDetailsException and there is > no special handling for TableNotFound exception. So, the final result for > running batch or delete operations against not existing table looks really > weird and missleading: > {noformat} > hbase(main):003:0> delete 't1', 'r1', 'c1' > 2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot > get replica 0 location for > {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} > ERROR: Failed 1 action: t1: 1 time, servers with issues: null > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HBASE-20926) IntegrationTestRSGroup is broken
[ https://issues.apache.org/jira/browse/HBASE-20926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov resolved HBASE-20926. - Resolution: Invalid The class was refactored to a number of smaller tests > IntegrationTestRSGroup is broken > - > > Key: HBASE-20926 > URL: https://issues.apache.org/jira/browse/HBASE-20926 > Project: HBase > Issue Type: Bug > Components: integration tests >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0-alpha-4 > > > There are several problems: > 1. It doesn't work in minicluster mode because in afterMethod() it using > IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in > the not distributed mode > 2. It uses tests from TestRSGroups which was supposed to be common for both > unit and integration tests, but for last two years, there were a number of > tests added that are using internal API, not compatible with the distributed > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27397) Spark-hbase support for 'startWith' predicate.
Sergey Soldatov created HBASE-27397: --- Summary: Spark-hbase support for 'startWith' predicate. Key: HBASE-27397 URL: https://issues.apache.org/jira/browse/HBASE-27397 Project: HBase Issue Type: New Feature Components: hbase-connectors Affects Versions: 3.0.0-alpha-3 Reporter: Sergey Soldatov Currently, spark-hbase connector doesn't support stringStartWith predicate and completely ignores it. This is a disadvantage compared to the Apache Phoenix connector and the old SHC (Hortonworks spark-hbase connector). It would be nice if we also have this functionality in the actual connector. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-27061) two phase bulkload is broken when SFT is in use.
Sergey Soldatov created HBASE-27061: --- Summary: two phase bulkload is broken when SFT is in use. Key: HBASE-27061 URL: https://issues.apache.org/jira/browse/HBASE-27061 Project: HBase Issue Type: Bug Affects Versions: 2.4.12 Reporter: Sergey Soldatov In HBASE-26707 for the SFT case, we are writing files directly to the region location. For that we are using HRegion.regionDir as the staging directory. The problem is that in reality, this dir is pointing to the WAL dir, so for S3 deployments that would be pointing to the hdfs. As the result during the execution of LoadIncrementalHFiles the process failed with the exception: {noformat} 2022-05-24 03:31:23,656 ERROR org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to complete bulk load java.lang.IllegalArgumentException: Wrong FS hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da -expected s3a://hbase at org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) at org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375) at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647) at org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337) at org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363) at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397) at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879) at org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) at org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (HBASE-27061) two phase bulkload is broken when SFT is in use.
[ https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov reassigned HBASE-27061: --- Assignee: Sergey Soldatov > two phase bulkload is broken when SFT is in use. > > > Key: HBASE-27061 > URL: https://issues.apache.org/jira/browse/HBASE-27061 > Project: HBase > Issue Type: Bug >Affects Versions: 2.4.12 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > In HBASE-26707 for the SFT case, we are writing files directly to the region > location. For that we are using HRegion.regionDir as the staging directory. > The problem is that in reality, this dir is pointing to the WAL dir, so for > S3 deployments that would be pointing to the hdfs. As the result during the > execution of LoadIncrementalHFiles the process failed with the exception: > {noformat} > 2022-05-24 03:31:23,656 ERROR > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to > complete bulk load > java.lang.IllegalArgumentException: Wrong FS > hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da > -expected s3a://hbase > at > org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375) > at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397) > at > org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work started] (HBASE-27061) two phase bulkload is broken when SFT is in use.
[ https://issues.apache.org/jira/browse/HBASE-27061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27061 started by Sergey Soldatov. --- > two phase bulkload is broken when SFT is in use. > > > Key: HBASE-27061 > URL: https://issues.apache.org/jira/browse/HBASE-27061 > Project: HBase > Issue Type: Bug >Affects Versions: 2.4.12 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > In HBASE-26707 for the SFT case, we are writing files directly to the region > location. For that we are using HRegion.regionDir as the staging directory. > The problem is that in reality, this dir is pointing to the WAL dir, so for > S3 deployments that would be pointing to the hdfs. As the result during the > execution of LoadIncrementalHFiles the process failed with the exception: > {noformat} > 2022-05-24 03:31:23,656 ERROR > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager: Failed to > complete bulk load > java.lang.IllegalArgumentException: Wrong FS > hdfs://ns1//hbase-wals/data/default/employees/4f367b303da4fed7667fff07fd4c6066/department/acd971097924463da6d6e3a15f9527da > -expected s3a://hbase > at > org.apache.hadoop.fs.s3native.S3xLoginHelper.checkPath(S3xLoginHelper.java:224) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.checkPath(S3AFileSystem.java:1375) > at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:647) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.makeQualified(S3AFileSystem.java:1337) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.qualify(S3AFileSystem.java:1363) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3521) > at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:511) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:397) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:387) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$SecureBulkLoadListener.prepareBulkLoad(SecureBulkLoadManager.java:397) > at > org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:6994) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:291) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager$1.run(SecureBulkLoadManager.java:266) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:360) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1879) > at > org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.secureBulkLoadHFiles(SecureBulkLoadManager.java:266) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2453) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45821) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:392) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:140) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:359) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:339) > {noformat} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work started] (HBASE-27053) IOException during caching of uncompressed block to the block cache.
[ https://issues.apache.org/jira/browse/HBASE-27053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-27053 started by Sergey Soldatov. --- > IOException during caching of uncompressed block to the block cache. > > > Key: HBASE-27053 > URL: https://issues.apache.org/jira/browse/HBASE-27053 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 2.4.12 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > When prefetch to block cache is enabled and blocks are compressed sometimes > caching fails with the exception: > {noformat} > 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] > regionserver.HRegion: Could not initialize all stores for the > region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7. > 2022-05-18 21:37:29,598 WARN [RS_OPEN_REGION-regionserver/x1:16020-2] > regionserver.HRegion: Failed initialize of region= > cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., > starting to roll back memstore > java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached > block contents differ, which should not have > happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090) > at > org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: java.lang.RuntimeException: Cached block > contents differ, which should not have > happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 > at > org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294) > at > org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344) > at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:294) > at > org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375) > at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115) > at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.lang.RuntimeException: Cached block contents differ, which > should not have > happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 > at > org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199) > at > org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432) > at > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418) > at > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319) > at java.util.Optional.ifPresent(Optional.java:159) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942) > at > org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931) > at >
[jira] [Commented] (HBASE-27053) IOException during caching of uncompressed block to the block cache.
[ https://issues.apache.org/jira/browse/HBASE-27053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539859#comment-17539859 ] Sergey Soldatov commented on HBASE-27053: - So, why and when did it happen. The usual scenario is when a region split has just happened and RS is trying to open both daughters and load those to the bucket cache. There is a single store that has the split point, so two threads are reading the same block and trying to store it in the cache. When we decompress the block, the first thing we are doing is the space allocation: [https://github.com/apache/hbase/blob/c7eb30d91015de67fb8207ac1818ce2a29dd60a4/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java#L649] Here we allocate the space for uncompressed data *AND* checksum. But we fill the data only, without filling the checksum space which actually might be filled by garbage from the previous usage. So the block is cached with this garbage and obviously, that part might and would not match when we try to store the same block from another thread. Unfortunately, I was unable to create a reasonable unit test for this scenario, but there is the manual steps to reproduce: a single node cluster with following configuration tweaks: hbase.hregion.memstore.flush.size=100 hbase.hregion.max.filesize=1000 hbase.bucketcache.ioengine=file:/tmp/hbase_cache hbase.bucketcache.size=20 hbase.hfile.thread.prefetch=4 And the load is generated by: {noformat} hbase org.apache.hadoop.hbase.util.LoadTestTool -compression SNAPPY -write 1:10:100 -num_keys 1 {noformat} Usually, after 700k-1.2m records, the earlier mentioned exception appears in the RS log. So, to solve the problem I would suggest adding a code that cleans up the checksum space when the decompression is completed. It doesn't look like an optimal solution, but right after decompression, we don't know whether the checksum space will be used or not, so we could not just trim the bytebuff. > IOException during caching of uncompressed block to the block cache. > > > Key: HBASE-27053 > URL: https://issues.apache.org/jira/browse/HBASE-27053 > Project: HBase > Issue Type: Bug > Components: BlockCache >Affects Versions: 2.4.12 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > When prefetch to block cache is enabled and blocks are compressed sometimes > caching fails with the exception: > {noformat} > 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] > regionserver.HRegion: Could not initialize all stores for the > region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7. > 2022-05-18 21:37:29,598 WARN [RS_OPEN_REGION-regionserver/x1:16020-2] > regionserver.HRegion: Failed initialize of region= > cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., > starting to roll back memstore > java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached > block contents differ, which should not have > happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092) > at > org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134) > at > org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090) > at > org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.IOException: java.lang.RuntimeException: Cached block > contents differ, which should not have > happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 > at > org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294) > at > org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344) > at
[jira] [Created] (HBASE-27053) IOException during caching of uncompressed block to the block cache.
Sergey Soldatov created HBASE-27053: --- Summary: IOException during caching of uncompressed block to the block cache. Key: HBASE-27053 URL: https://issues.apache.org/jira/browse/HBASE-27053 Project: HBase Issue Type: Bug Components: BlockCache Affects Versions: 2.4.12 Reporter: Sergey Soldatov Assignee: Sergey Soldatov When prefetch to block cache is enabled and blocks are compressed sometimes caching fails with the exception: {noformat} 2022-05-18 21:37:29,597 ERROR [RS_OPEN_REGION-regionserver/x1:16020-2] regionserver.HRegion: Could not initialize all stores for the region=cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7. 2022-05-18 21:37:29,598 WARN [RS_OPEN_REGION-regionserver/x1:16020-2] regionserver.HRegion: Failed initialize of region= cluster_test,,1652935047946.a57ca5f9e7bebb4855a44523063f79c7., starting to roll back memstore java.io.IOException: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1149) at org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1092) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:996) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:946) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7240) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7199) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7175) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7134) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7090) at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:147) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:100) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:294) at org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:344) at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:294) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6375) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1115) at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1112) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more Caused by: java.lang.RuntimeException: Cached block contents differ, which should not have happened.cacheKey:19307adf1c2248ebb5675116ea640712.c3a21f2005abf308e4a8c9759d4e05fe_0 at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.validateBlockAddition(BlockCacheUtil.java:199) at org.apache.hadoop.hbase.io.hfile.BlockCacheUtil.shouldReplaceExistingCacheBlock(BlockCacheUtil.java:231) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.shouldReplaceExistingCacheBlock(BucketCache.java:447) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlockWithWait(BucketCache.java:432) at org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.cacheBlock(BucketCache.java:418) at org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.cacheBlock(CombinedBlockCache.java:60) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.lambda$readBlock$2(HFileReaderImpl.java:1319) at java.util.Optional.ifPresent(Optional.java:159) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.readBlock(HFileReaderImpl.java:1317) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.readAndUpdateNewBlock(HFileReaderImpl.java:942) at org.apache.hadoop.hbase.io.hfile.HFileReaderImpl$HFileScannerImpl.seekTo(HFileReaderImpl.java:931) at org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:171) at org.apache.hadoop.hbase.io.HalfStoreFileReader.getFirstKey(HalfStoreFileReader.java:321) at org.apache.hadoop.hbase.regionserver.HStoreFile.open(HStoreFile.java:477) at org.apache.hadoop.hbase.regionserver.HStoreFile.initReader(HStoreFile.java:490) at
[jira] [Commented] (HBASE-26972) Restored table from snapshot that has MOB is inconsistent
[ https://issues.apache.org/jira/browse/HBASE-26972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526717#comment-17526717 ] Sergey Soldatov commented on HBASE-26972: - The reason of that: /hbase2.4/mobdir/data/default/table_1/1bccf339572b9a4db7475abcf57eeb8f/data/table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a which is a link to /hbase2.4/archive/data/default/table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a But HFileLink is unable to detect that this is a link because of the underscore in the middle of the file name and that doesn't fit to the pattern: {code} public static final String HFILE_NAME_REGEX = "[0-9a-f]+(?:(?:_SeqId_[0-9]+_)|(?:_del))?"; {code} So isHFileLink would return false for such files and that leads to unpredictable results. The easy way to fix it is to allow underscore in the HFILE_NAME_REGEX, but I'm feeling a bit uncomfortable about it. I would really appreciate suggestions. > Restored table from snapshot that has MOB is inconsistent > - > > Key: HBASE-26972 > URL: https://issues.apache.org/jira/browse/HBASE-26972 > Project: HBase > Issue Type: Bug > Components: mob, snapshots >Affects Versions: 3.0.0-alpha-2 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > When we restore the table from snapshot and it has MOB files, there are links > that do not fit the pattern of HFileLink. I'm not sure to which side effects > that might lead, but at least it's not possible to create a snapshot right > after the restore: > {quote} > Version 3.0.0-alpha-3-SNAPSHOT, rcd45cadbc1a42db359ff4e775cbd4b55cfe28140, > Fri Apr 22 03:04:25 PM PDT 2022 > Took 0.0016 seconds > > > hbase:001:0> list_snapshot > list_snapshot_sizes list_snapshots > hbase:001:0> list_snapshots > SNAPSHOT TABLE + CREATION TIME > > > t1 table_1 (2022-04-22 15:48:04 -0700) > > > 1 row(s) > Took 1.0881 seconds > > > => ["t1"] > hbase:002:0> restore_snapshot 't1' > Took 2.3942 seconds > > > hbase:003:0> snapshot > snapshot snapshot_cleanup_enabled snapshot_cleanup_switch > > hbase:003:0> snapshot 'table_1', 't2' > ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { > ss=t2 table=table_1 type=FLUSH ttl=0 } had an error. Procedure t2 { > waiting=[] done=[] } > at > org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:403) > at > org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1325) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:106) > at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:86) > Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via > Failed taking snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } due to > exception:Can't find hfile: > table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a > in the real > (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) > or archive > (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) > directory for the primary > table.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't > find hfile: >
[jira] [Created] (HBASE-26972) Restored table from snapshot that has MOB is inconsistent
Sergey Soldatov created HBASE-26972: --- Summary: Restored table from snapshot that has MOB is inconsistent Key: HBASE-26972 URL: https://issues.apache.org/jira/browse/HBASE-26972 Project: HBase Issue Type: Bug Components: mob, snapshots Affects Versions: 3.0.0-alpha-2 Reporter: Sergey Soldatov Assignee: Sergey Soldatov When we restore the table from snapshot and it has MOB files, there are links that do not fit the pattern of HFileLink. I'm not sure to which side effects that might lead, but at least it's not possible to create a snapshot right after the restore: {quote} Version 3.0.0-alpha-3-SNAPSHOT, rcd45cadbc1a42db359ff4e775cbd4b55cfe28140, Fri Apr 22 03:04:25 PM PDT 2022 Took 0.0016 seconds hbase:001:0> list_snapshot list_snapshot_sizes list_snapshots hbase:001:0> list_snapshots SNAPSHOT TABLE + CREATION TIME t1 table_1 (2022-04-22 15:48:04 -0700) 1 row(s) Took 1.0881 seconds => ["t1"] hbase:002:0> restore_snapshot 't1' Took 2.3942 seconds hbase:003:0> snapshot snapshot snapshot_cleanup_enabled snapshot_cleanup_switch hbase:003:0> snapshot 'table_1', 't2' ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } had an error. Procedure t2 { waiting=[] done=[] } at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:403) at org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1325) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:106) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:86) Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException via Failed taking snapshot { ss=t2 table=table_1 type=FLUSH ttl=0 } due to exception:Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a in the real (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) or archive (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) directory for the primary table.:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a in the real (hdfs://localhost:8020/hbase2.4/mobdir/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) or archive (hdfs://localhost:8020/hbase2.4/archive/data/table_1/1bccf339572b9a4db7475abcf57eeb8f-table_1/1bccf339572b9a4db7475abcf57eeb8f/data/bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a) directory for the primary table. at org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:82) at org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:322) at org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:392) ... 6 more Caused by: org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Can't find hfile: table_1=1bccf339572b9a4db7475abcf57eeb8f-bee397acc400449ea3a35ed3fc87fea1202204220b9b3b97b4fc42379a7b6455c3dc1613_49a15ec2a84c8489965d1910a05cca3a
[jira] [Commented] (HBASE-26767) Rest server should not use a large Header Cache.
[ https://issues.apache.org/jira/browse/HBASE-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496975#comment-17496975 ] Sergey Soldatov commented on HBASE-26767: - [~elserj] You are absolutely right. I was observing this issue with the Kerberos authn. The same connection was used for different logins and after a while when the cache got overflowed the service started to return authn failures (401/403/400) > Rest server should not use a large Header Cache. > > > Key: HBASE-26767 > URL: https://issues.apache.org/jira/browse/HBASE-26767 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 2.4.9 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE > (65536). That's not compatible with jetty-9.4.x because the cache size is > limited by Character.MAX_VALUE - 1 (65534) there. According to the Jetty > source code comments, it's possible to have a buffer overflow in the cache > for higher values and that might lead to wrong/incomplete values returned by > cache and following incorrect header handling. > There are a couple of ways to fix it: > 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534 > 2. make header cache size configurable and set its size separately from the > header size. > I believe that the second would give us more flexibility. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HBASE-26767) Rest server should not use a large Header Cache.
[ https://issues.apache.org/jira/browse/HBASE-26767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26767 started by Sergey Soldatov. --- > Rest server should not use a large Header Cache. > > > Key: HBASE-26767 > URL: https://issues.apache.org/jira/browse/HBASE-26767 > Project: HBase > Issue Type: Bug > Components: REST >Affects Versions: 2.4.9 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE > (65536). That's not compatible with jetty-9.4.x because the cache size is > limited by Character.MAX_VALUE - 1 (65534) there. According to the Jetty > source code comments, it's possible to have a buffer overflow in the cache > for higher values and that might lead to wrong/incomplete values returned by > cache and following incorrect header handling. > There are a couple of ways to fix it: > 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534 > 2. make header cache size configurable and set its size separately from the > header size. > I believe that the second would give us more flexibility. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26767) Rest server should not use a large Header Cache.
Sergey Soldatov created HBASE-26767: --- Summary: Rest server should not use a large Header Cache. Key: HBASE-26767 URL: https://issues.apache.org/jira/browse/HBASE-26767 Project: HBase Issue Type: Bug Components: REST Affects Versions: 2.4.9 Reporter: Sergey Soldatov Assignee: Sergey Soldatov In the RESTServer we set the HeaderCache size to DEFAULT_HTTP_MAX_HEADER_SIZE (65536). That's not compatible with jetty-9.4.x because the cache size is limited by Character.MAX_VALUE - 1 (65534) there. According to the Jetty source code comments, it's possible to have a buffer overflow in the cache for higher values and that might lead to wrong/incomplete values returned by cache and following incorrect header handling. There are a couple of ways to fix it: 1. change the value of DEFAULT_HTTP_MAX_HEADER_SIZE to 65534 2. make header cache size configurable and set its size separately from the header size. I believe that the second would give us more flexibility. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work started] (HBASE-26463) Unreadable table names after HBASE-24605
[ https://issues.apache.org/jira/browse/HBASE-26463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-26463 started by Sergey Soldatov. --- > Unreadable table names after HBASE-24605 > > > Key: HBASE-26463 > URL: https://issues.apache.org/jira/browse/HBASE-26463 > Project: HBase > Issue Type: Bug > Components: UI >Affects Versions: 3.0.0-alpha-1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Trivial > Attachments: After.png, Before.png > > > During the fix HBASE-24605 (Break long region names in the web UI) the 'word > break' rule was applied to the 'table-striped' style, so literally, all > tables in Master Web UI were affected. The most noticeable is the 'User > Tables' when table names, as well as descriptions, and even State are wrapped > in a very ugly way. > We have two options here: > 1. Fix the user tables only as it was done previously for the procedures table > 2. Apply 'word break' to those tables that require that. > Since most of the users are comfortable with the current UI we may go through > the first way, so the changes would affect only the User table. > Sample screenshots before and after are attached. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (HBASE-26463) Unreadable table names after HBASE-24605
Sergey Soldatov created HBASE-26463: --- Summary: Unreadable table names after HBASE-24605 Key: HBASE-26463 URL: https://issues.apache.org/jira/browse/HBASE-26463 Project: HBase Issue Type: Bug Components: UI Affects Versions: 3.0.0-alpha-1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Attachments: After.png, Before.png During the fix HBASE-24605 (Break long region names in the web UI) the 'word break' rule was applied to the 'table-striped' style, so literally, all tables in Master Web UI were affected. The most noticeable is the 'User Tables' when table names, as well as descriptions, and even State are wrapped in a very ugly way. We have two options here: 1. Fix the user tables only as it was done previously for the procedures table 2. Apply 'word break' to those tables that require that. Since most of the users are comfortable with the current UI we may go through the first way, so the changes would affect only the User table. Sample screenshots before and after are attached. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HBASE-26267) Master initialization fails if Master Region WAL dir is missing
[ https://issues.apache.org/jira/browse/HBASE-26267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427210#comment-17427210 ] Sergey Soldatov commented on HBASE-26267: - [~zhangduo] [~zyork] do you have any other concerns regarding Josh's PR? Could we get it in? > Master initialization fails if Master Region WAL dir is missing > --- > > Key: HBASE-26267 > URL: https://issues.apache.org/jira/browse/HBASE-26267 > Project: HBase > Issue Type: Improvement > Components: master >Affects Versions: 2.4.6 >Reporter: Josh Elser >Assignee: Josh Elser >Priority: Major > Fix For: 2.5.0, 3.0.0-alpha-2, 2.4.8 > > > From a recent branch-2.4 build: > {noformat} > 2021-09-07 19:31:19,666 ERROR [master/localhost:16000:becomeActiveMaster] > master.HMaster(159): * ABORTING master localhost,16000,1631057476442: > Unhandled exception. Starting shutdown. * > java.io.FileNotFoundException: File > hdfs://localhost:8020/hbase-2.4-wals/MasterData/WALs does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126) > at > org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:226) > at > org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:303) > at > org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104) > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:839) > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2189) > at > org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:512) > at java.lang.Thread.run(Thread.java:748) > {noformat} > If the WAL directory is missing but the Master Region already exists, we will > try to list the contents of the Master Region's WAL directory which may or > may not exist. If we simply check to make sure the directory exists and then > the rest of the initialization code works as expected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617877#comment-16617877 ] Sergey Soldatov commented on HBASE-20952: - bq. However, when we think about hypothetical log systems, we could see an optimization where the filtering of _just one Region's_ edits from a log can be pushed down into the log system itself. Such a feature would remove the need for WAL splitting to happen universally in HBase. That's exactly I wanted to say. There are some cases where we don't need to split WALs (WAL per RS model, streams with filters, etc). And the question is whether we need to make any significant refactoring in this area (i.e., make WALSplitter working with abstract WALs and creating new WALs). Or we may live with the case where WALProvider has an interface to get recovered edits for the particular region and for the current providers the result of the WALSplitter will be returned, but for custom providers that would be implementation specific. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615326#comment-16615326 ] Sergey Soldatov commented on HBASE-20952: - A few of my personal thoughts that are based on previous experience of implementing such kind of functionality in HBase (C5, non-stop hbase, hydrabase). It would be better to perform atomic meaningful changes that address a particular problem without breaking the existing functionality and without adding an additional complexity. We need a 'pluggable' WALProvider that will be agnostic to file system structure. Fine. Let's make it configurable and remove the dependency on file systems (introducing WALIdentity or whatever we call it). It doesn't work with the current implementation of replication? Not good, but we may mention that it in the Limitation section. All we need is an understanding that it's possible to implement even if it would require a lot of efforts. Possible the replication itself requires a new Jira 'Revisit Replication approach' especially if we are talking about different kind of wal providers. Possible someone would love to have a setup with replication between on-prem cluster that is using regular WAL and another in AWS using Bookeeper wal. And that would require another layer of abstraction for replication itself. Should it be done under this jira? Honestly speaking I don't think so. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615265#comment-16615265 ] Sergey Soldatov commented on HBASE-20952: - {quote}In the old time we will roll the wal writer, and it is done by RS, so closing the wal file is not enough, as the RS will try to open a new one and write to it. That's why we need to rename the wal directory. {quote} For the WAL provider that doesn't depend on the HDFS directory structure, there should be a manager that keeps information about existing logs. Internals are implementation specific (i.e. for Kafka wal provider it may be a separate topic or some internal DB. For consensus-based logs like Ratis LogService that might be a separate state machine), But any new log should be registered there. Adding a new method to WALProvider like 'disable'/'decommission' that would tell the manager to reject new logs for particular RS (or even region if we consider wal per region schema) is not a problem. For the existing wal providers, that method may rename the wal directory. {quote}In your words above, it seems to me that we will only have one stream opened forever for a RS, then how do we drop the old edits after flush? And how do we setup the wal stream? Only once at the RS start up? And if there are errors later, we just abort? {quote} Not necessary. There is no problem to have wal per region. Actually, in some cases, it would be preferable. For example Kafka topic per region. Any kind of recovery would be a simple subscribe/replay the particular topic. No log splits, less offline time. For a regular case, we are not talking about streams. It's just a WAL implementation that supports the append operation. For replication/recovery we should be able to get a stream and read from the particular ID/offset. Error handling should be hidden by the implementation. A simple example for quorum based implementation. We have 3 nodes quorum for log 'RS1.1' (RS1, RS2,RS3). RS2, RS3 went down due some reason, so we lost the majority and this quorum becomes read-only. A new log 'RS1.2' is created with the quorum (RS1, RS4, RS5) and all writes are going there. But if we speak about reading stream it would provide a single instance that iterates through RS1.1 and RS1.2 continuously. The same approach may be applied to the existing wal files as well. {quote}And for the FileSystem, we will use multi wal to increase the performance, and the logic is messed up with WALProvider. Does ratis still need multi wal to increase the performance? And if not, what's the plan? We need to refactor the multi wal related code, to not work against the WALProvider but something with the FileSystem related stuffs directly? {quote} That might be done in the further refactoring of multiwal. At the moment the approach is that we may specify 3rd party wal provider class in WALFactory. So if it's there, multiwal would not be used at all as it's the provider class. In other hands, it could be refactored to something like 'wal strategy' and works with any kind of providers. {quote}had mentioned offline yesterday that he thinks some gaps still exist around WAL splitting – do you understand that well enough to suggest what needs to be addressed in the doc which is not already there? {quote} WALSplitter is a separate topic for the discussion. The current implementation has a bunch of dependencies on file operations such as temporary files, list of corrupted files, etc. From HBase perspective, it would be much easier to keep it as is and make log splitter an interface that should take log and create list of recovery logs. But from the perspective of 3rd party wal developer that would be a nightmare to handle all possible cases and fit into the split log chore logic. In other hands for the 1st iteration, this may be hidden by the schema where 3rd party wal may not use the splitter at all and recovery would be reading a stream of records provided by the WALProvider for a particular region > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup
[jira] [Commented] (HBASE-20952) Re-visit the WAL API
[ https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16602483#comment-16602483 ] Sergey Soldatov commented on HBASE-20952: - bq. Will the old FB's hydrabase impl help here in determining the APIs needed here? If we are talking about HBASE-12259, than nope. Actually, most of the work for Hydrabase was made for the consensus protocol implementation and only a few attempts to apply that to the WAL system itself ( that were successfully dropped due to not accept for hbase-consensus module). We don't want to add our own implementation for quorum based consensus protocol. We want to make current WAL system flexible enough to build a new WAL implementation based whether on some 3rd party consensus protocol implementation (RAFT/Paxos/etc) or any existing Distributed Log implementations (Apache Kafka, Apache BookKeeper, etc). The interfaces should be simple with a meaningful public contract and the number of interfaces to implement should be reasonable as well. > Re-visit the WAL API > > > Key: HBASE-20952 > URL: https://issues.apache.org/jira/browse/HBASE-20952 > Project: HBase > Issue Type: Sub-task > Components: wal >Reporter: Josh Elser >Priority: Major > Attachments: 20952.v1.txt > > > Take a step back from the current WAL implementations and think about what an > HBase WAL API should look like. What are the primitive calls that we require > to guarantee durability of writes with a high degree of performance? > The API needs to take the current implementations into consideration. We > should also have a mind for what is happening in the Ratis LogService (but > the LogService should not dictate what HBase's WAL API looks like RATIS-272). > Other "systems" inside of HBase that use WALs are replication and > backup Replication has the use-case for "tail"'ing the WAL which we > should provide via our new API. B doesn't do anything fancy (IIRC). We > should make sure all consumers are generally going to be OK with the API we > create. > The API may be "OK" (or OK in a part). We need to also consider other methods > which were "bolted" on such as {{AbstractFSWAL}} and > {{WALFileLengthProvider}}. Other corners of "WAL use" (like the > {{WALSplitter}} should also be looked at to use WAL-APIs only). > We also need to make sure that adequate interface audience and stability > annotations are chosen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558989#comment-16558989 ] Sergey Soldatov commented on HBASE-20657: - Oops. accidentally switched to branch-2 from master. Actually changes for MasterProcedureScheduler were added to master as part of HBASE-20569. Not sure what should we do for branch-2 in that case. WDYT [~elserj] > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558973#comment-16558973 ] Sergey Soldatov commented on HBASE-20657: - Removed the internal lock implementation, thanks [~Apache9] for making those a part of Procedure class I still think that changes in MasterProcedureScheduler.java should be applied. To reproduce the problem - make MTP holding the lock (as the patch does) and run the provided test. Everything will end up with a number of regions stuck in RiT state forever. The problem is in the state machine optimization, but not in MTP itself, so it may happen with any other complex procedure that may hold the lock. > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov reassigned HBASE-20657: --- Assignee: stack (was: Sergey Soldatov) > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov reassigned HBASE-20657: --- Assignee: Sergey Soldatov (was: stack) > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Fix Version/s: 3.0.0 > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Attachment: HBASE-20657-4-master.patch > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 3.0.0, 2.0.2 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-4-master.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558600#comment-16558600 ] Sergey Soldatov commented on HBASE-20927: - [~elserj] Almost. After some modifications (removing dependencies on internal API from TestRSGroupBase) noticed this. > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20927: Status: Patch Available (was: Open) > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20927: Attachment: HBASE-20927.master.002.patch > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558051#comment-16558051 ] Sergey Soldatov commented on HBASE-20927: - [~yuzhih...@gmail.com] added a test case. > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > Attachments: HBASE-20927-master.patch, HBASE-20927.master.002.patch > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20927: Attachment: HBASE-20927-master.patch > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > Attachments: HBASE-20927-master.patch > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
[ https://issues.apache.org/jira/browse/HBASE-20927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20927: Fix Version/s: 3.0.0 > RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not > processed yet. > > > Key: HBASE-20927 > URL: https://issues.apache.org/jira/browse/HBASE-20927 > Project: HBase > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Fix For: 3.0.0, 2.1.1 > > > Admin.clearDeadServers is supposed to return the list of servers that were > not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is > thrown: > {noformat} > Caused by: > org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): > org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers > to remove cannot be null or empty. > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) > at > org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) > at > org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) > at > org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > That happens because in postClearDeadServers it calls > groupAdminServer.removeServers(clearedServer) even if the clearedServer is > empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20927) RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet.
Sergey Soldatov created HBASE-20927: --- Summary: RSGroupAdminEndpoint doesn't handle clearing dead servers if they are not processed yet. Key: HBASE-20927 URL: https://issues.apache.org/jira/browse/HBASE-20927 Project: HBase Issue Type: Bug Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.1 Admin.clearDeadServers is supposed to return the list of servers that were not cleared. But if RSGroupAdminEndpoint is set, the ConstraintException is thrown: {noformat} Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.constraint.ConstraintException): org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty. at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:573) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:519) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1607) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$133.call(MasterCoprocessorHost.java:1604) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost$ObserverOperationWithoutResult.callObserver(CoprocessorHost.java:540) at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.execOperation(CoprocessorHost.java:614) at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1604) at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:2231) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:131) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) {noformat} That happens because in postClearDeadServers it calls groupAdminServer.removeServers(clearedServer) even if the clearedServer is empty. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20926) IntegrationTestRSGroup is broken
Sergey Soldatov created HBASE-20926: --- Summary: IntegrationTestRSGroup is broken Key: HBASE-20926 URL: https://issues.apache.org/jira/browse/HBASE-20926 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.1 There are several problems: 1. It doesn't work in minicluster mode because in afterMethod() it using IntegrationTestingUtility.restoreCluster() which just shutdown minicluster in the not distributed mode 2. It uses tests from TestRSGroups which was supposed to be common for both unit and integration tests, but for last two years, there were a number of tests added that are using internal API, not compatible with the distributed mode. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20784) Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager
[ https://issues.apache.org/jira/browse/HBASE-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535263#comment-16535263 ] Sergey Soldatov commented on HBASE-20784: - [~Apache9] it works perfectly for me. Thank you! > Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager > > > Key: HBASE-20784 > URL: https://issues.apache.org/jira/browse/HBASE-20784 > Project: HBase > Issue Type: Bug > Components: master, UI >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Minor > Fix For: 3.0.0, 2.2.0, 2.1.1 > > Attachments: HBASE-20784.patch > > > In HBASE-20722 we removed the usage of RegionServerTracker when getting > information for region server. And version in server manager is a int, and we > convert it to a String when displaying it on the master ui, so we will lose > the SNAPSHOT suffix. Not a big one as this is not a problem for normal > releases. Open a issue for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20784) Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager
[ https://issues.apache.org/jira/browse/HBASE-20784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534361#comment-16534361 ] Sergey Soldatov commented on HBASE-20784: - The problem is wider than just losing SNAPSHOT prefix. 1. It may completely lose the RS version (Shows 0.0.0 in the master UI for the version 2.1-SNAPSHOT) 2. That affects any custom build. i.e. builds from CI with the build number as a suffix. So the build number will be stripped and may be a real issue for devops during rolling upgrade on large clusters 3. Any vendor-specific builds. 4. It triggers the report about the inconsistent version between Master and RSes. We may consider using int versions on the master side as well, but it would solve (4) only. > Will lose the SNAPSHOT suffix if we get the version of RS from ServerManager > > > Key: HBASE-20784 > URL: https://issues.apache.org/jira/browse/HBASE-20784 > Project: HBase > Issue Type: Bug > Components: master, UI >Reporter: Duo Zhang >Priority: Minor > > In HBASE-20722 we removed the usage of RegionServerTracker when getting > information for region server. And version in server manager is a int, and we > convert it to a String when displaying it on the master ui, so we will lose > the SNAPSHOT suffix. Not a big one as this is not a problem for normal > releases. Open a issue for it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20852) inconsistent version report in Master UI after HBASE-20722
Sergey Soldatov created HBASE-20852: --- Summary: inconsistent version report in Master UI after HBASE-20722 Key: HBASE-20852 URL: https://issues.apache.org/jira/browse/HBASE-20852 Project: HBase Issue Type: Bug Affects Versions: 3.0.0, 2.1.0 Reporter: Sergey Soldatov Master web UI is able to report if the versions of RS and master are different. Previously the check was performed between the master version that we get from Version class ( generated during the build) and RS version from RegionServerTracker (which was using the same Version class). In HBASE-20722 this behavior has been changed, so now for RS we get numeric version and convert it to the string representation. That works only for numeric versions like 2.0.0 / 2.1.0, but it doesn't work for -SNAPSHOT or any other custom versions and Master UI reports that all region servers have an inconsistent version. [~Apache9], [~stack], [~tedyu] FYI -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20826) Truncate responseInfo attributes on RpcServer WARN messages
[ https://issues.apache.org/jira/browse/HBASE-20826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528467#comment-16528467 ] Sergey Soldatov commented on HBASE-20826: - [~yuzhih...@gmail.com] Actually my point is that this problem should be handled on both sides. So I'm happy with the suggested patch as well as I'm going to do some work on Phoenix side as well. I just want to be sure that I would have a full log with TRACE level (sometimes we need to find duplicated values in the filter and it would not be possible without it). > Truncate responseInfo attributes on RpcServer WARN messages > --- > > Key: HBASE-20826 > URL: https://issues.apache.org/jira/browse/HBASE-20826 > Project: HBase > Issue Type: Improvement > Components: rpc >Reporter: Sergey Soldatov >Assignee: Josh Elser >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0 > > Attachments: HBASE-20826.001.branch-2.0.patch, > HBASE-20826.002.branch-2.0.patch > > > With Phoenix in the picture, dumping the {{Call}} protobuf to the RS log can > get *really* chatty, real fast. Notably, some serialized filters just spam > the log with binary garbage. > Let's add an upper-limit to the length of params we'll put out at WARN, and > leave the full content for TRACE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20826) Truncate responseInfo attributes on RpcServer WARN messages
[ https://issues.apache.org/jira/browse/HBASE-20826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528303#comment-16528303 ] Sergey Soldatov commented on HBASE-20826: - At first place, that should be handled on Filter level as well (so Phoenix filters should be aware that they are putting in toString, otherwise it would spend too much time just building 10-20Mb strings as well as it would increase the memory pressure). And for the TRACE level we shouldn't truncate anything, otherwise, it may cause problems with identifying the problem root causes. Just my 2 cents. > Truncate responseInfo attributes on RpcServer WARN messages > --- > > Key: HBASE-20826 > URL: https://issues.apache.org/jira/browse/HBASE-20826 > Project: HBase > Issue Type: Improvement > Components: rpc >Reporter: Sergey Soldatov >Assignee: Josh Elser >Priority: Major > Fix For: 3.0.0, 2.1.0, 2.0.2, 2.2.0 > > Attachments: HBASE-20826.001.branch-2.0.patch, > HBASE-20826.002.branch-2.0.patch > > > With Phoenix in the picture, dumping the {{Call}} protobuf to the RS log can > get *really* chatty, real fast. Notably, some serialized filters just spam > the log with binary garbage. > Let's add an upper-limit to the length of params we'll put out at WARN, and > leave the full content for TRACE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20719) HTable.batch() doesn't handle TableNotFound correctly.
Sergey Soldatov created HBASE-20719: --- Summary: HTable.batch() doesn't handle TableNotFound correctly. Key: HBASE-20719 URL: https://issues.apache.org/jira/browse/HBASE-20719 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.1.0 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Fix For: 2.1.0 batch() as well as delete() are processing using AsyncRequest. To report about problems we are using RetriesExhaustedWithDetailsException and there is no special handling for TableNotFound exception. So, the final result for running batch or delete operations against not existing table looks really weird and missleading: {noformat} hbase(main):003:0> delete 't1', 'r1', 'c1' 2018-06-12 15:02:50,742 ERROR [main] client.AsyncRequestFutureImpl: Cannot get replica 0 location for {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} ERROR: Failed 1 action: t1: 1 time, servers with issues: null {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502708#comment-16502708 ] Sergey Soldatov commented on HBASE-20657: - [~stack] Actually the patch I provided makes MTP holding the lock, so while it's not finished, it will keep the exclusive lock ( exactly that [~Apache9] mentioned in the last comment). But that would lead to the concurrent execution problem that I've addressed in the MasterProcedureScheduler changes. So the patch fixes the case when the table is healthy and all regions are online and 2 concurrent MTPs are executed. > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: stack >Priority: Critical > Fix For: 2.0.1 > > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Attachment: HBASE-20657-3-branch-2.patch > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-3-branch-2.patch, > HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Attachment: HBASE-20657-2-branch-2.patch > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-2-branch-2.patch, HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Status: Patch Available (was: Open) Changes that I've made: 1. Made ModifyTableProcedure holding a lock. That would prevent a concurrent execution of ModifyTable procedures 2. Added an additional condition when we may remove the queue (TableQueue in this case) from the fairqueue. We would remove it only the next procedure not only has a different parent but also doesn't belongs to the current lock. We need that to prevent the race condition that may happen during complex procedure execution that has a chain of subprocedures. > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Attachment: HBASE-20657-1-branch-2.patch > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-1-branch-2.patch, > HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16501001#comment-16501001 ] Sergey Soldatov commented on HBASE-20657: - [~elserj] thank you for pointing out. I've checked HBASE-20634 already ([~an...@apache.org] mentioned it offline). Those bugs are unrelated and failure is reproducible on top of the branch-2. > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Attachment: HBASE-20657-testcase-branch2.patch > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498727#comment-16498727 ] Sergey Soldatov commented on HBASE-20657: - Let's consider the case that [~an...@apache.org] found. Attaching the test case to reproduce the problem. What happens: 1. First ModifyTableProcedure (MTP) gets the exclusive lock, performs all meta modifications, creates subprocedures to reopen all regions. All those subprocedures are going to TableQueue. After that, it releases the lock. 2. Second MTP gets the exclusive lock, performs all meta modifications and tries to create subprocedures. Since during the reopen we create MoveRegionProcedure, we immediately fail in the constructor of MoveRegionProcedure, because of some regions may be closed at that moment because of the first MTP. 3. Since the lock was not released in (2), the scheduler will continuously try to execute the second MTP, failing over and over again. [~stack] that's the problem I've tried to describe in the discussion in HBASE-20202. There are several directions (not ways, because none of them actually work) to solve it: 1. remove the call of openRegion in MoveRegionProcedure constructor. Still will have a problem that during the execution of subprocedures of the first MTP, metadata will be updated with new values from the second MTP and would start failing during the open regions (like in the example - the report will be that compression is incorrect). 2. make MTP holding a lock. Doesn't work as well. Even for a single MTP call, it gets stuck if there is more than one worker for proc-v2. Reason - MasterProcedureScheduler#doPoll : https://github.com/apache/hbase/blob/74ef118e9e2246c09280ebb7eb6552ef91bdd094/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/MasterProcedureScheduler.java#L200 Since MTP creates 3 levels of subprocedures, due the concurrent execution, the queue may have a mix of subprocedures that have different parents and all workers may remove that table queue from the fairqueue at the same time, so there will be nothing to execute for them and some regions would get stuck in RiT state. Not sure, but there is a chance that this is a side effect of HBASE-2, so FYI [~Apache9] Another question is related to (1) from the description. Is it expected, that during the retry from client we generate a new nonce key for the same procedure? Any thoughts, suggestions or directions are very appreciated. Also FYI [~an...@apache.org], [~elserj] > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-20657-testcase-branch2.patch > > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov reassigned HBASE-20657: --- Assignee: Sergey Soldatov > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may stuck
Sergey Soldatov created HBASE-20657: --- Summary: Retrying RPC call for ModifyTableProcedure may stuck Key: HBASE-20657 URL: https://issues.apache.org/jira/browse/HBASE-20657 Project: HBase Issue Type: Bug Components: Client, proc-v2 Affects Versions: 2.0.0 Reporter: Sergey Soldatov Env: 2 masters, 1 RS. Steps to reproduce: Active master is killed while ModifyTableProcedure is executed. If the table has enough regions it may come that when the secondary master get active some of the regions may be closed, so once client retries the call to the new active master, a new ModifyTableProcedure is created and get stuck during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: 1. When we are retrying from client side, we call modifyTableAsync which create a procedure with a new nonce key: {noformat} ModifyTableRequest request = RequestConverter.buildModifyTableRequest( td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); {noformat} So on the server side, it's considered as a new procedure and starts executing immediately. 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create MoveRegionProcedure for each region, but it checks whether the region is online (and it's not), so it fails immediately, forcing the procedure to restart. [~an...@apache.org] saw a similar case when two concurrent ModifyTable procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-20657) Retrying RPC call for ModifyTableProcedure may get stuck
[ https://issues.apache.org/jira/browse/HBASE-20657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-20657: Summary: Retrying RPC call for ModifyTableProcedure may get stuck (was: Retrying RPC call for ModifyTableProcedure may stuck) > Retrying RPC call for ModifyTableProcedure may get stuck > > > Key: HBASE-20657 > URL: https://issues.apache.org/jira/browse/HBASE-20657 > Project: HBase > Issue Type: Bug > Components: Client, proc-v2 >Affects Versions: 2.0.0 >Reporter: Sergey Soldatov >Priority: Major > > Env: 2 masters, 1 RS. > Steps to reproduce: Active master is killed while ModifyTableProcedure is > executed. > If the table has enough regions it may come that when the secondary master > get active some of the regions may be closed, so once client retries the call > to the new active master, a new ModifyTableProcedure is created and get stuck > during MODIFY_TABLE_REOPEN_ALL_REGIONS state handling. That happens because: > 1. When we are retrying from client side, we call modifyTableAsync which > create a procedure with a new nonce key: > {noformat} > ModifyTableRequest request = > RequestConverter.buildModifyTableRequest( > td.getTableName(), td, ng.getNonceGroup(), ng.newNonce()); > {noformat} > So on the server side, it's considered as a new procedure and starts > executing immediately. > 2. When we are processing MODIFY_TABLE_REOPEN_ALL_REGIONS we create > MoveRegionProcedure for each region, but it checks whether the region is > online (and it's not), so it fails immediately, forcing the procedure to > restart. > [~an...@apache.org] saw a similar case when two concurrent ModifyTable > procedures were running and got stuck in the similar way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HBASE-20621) Unclear error for deleting from not existing table.
Sergey Soldatov created HBASE-20621: --- Summary: Unclear error for deleting from not existing table. Key: HBASE-20621 URL: https://issues.apache.org/jira/browse/HBASE-20621 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Sergey Soldatov When I try to delete a row from a not existing table, the error is quite confusing. Instead of getting a table not found exception I got {noformat} ERROR [main] client.AsyncRequestFutureImpl: Cannot get replica 0 location for {"totalColumns":1,"row":"r1","families":{"c1":[{"qualifier":"","vlen":0,"tag":[],"timestamp":9223372036854775807}]},"ts":9223372036854775807} ERROR: Failed 1 action: t1: 1 time, servers with issues: null {noformat} That happens, because delete is using AsyncRequestFuture which wraps all region location errors into 'Cannot get replica' error. I expect that others actions like batch, mutateRow, checkAndDelete behave in the same way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20202) [AMv2] Don't move region if its a split parent or offlined
[ https://issues.apache.org/jira/browse/HBASE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16478340#comment-16478340 ] Sergey Soldatov commented on HBASE-20202: - bq. On the the new master, it will notice this and re-run the step that was doing the close step... That is what should happen sir. Correct. It happens almost in that way. But in my case it's a bit more complicated : We have modified a table (chaos monkey changed bloom filter) and when ModifyTableProcedure was executed, and master was killed during execution of UnassignProcedure when one of the regions was closing New master had the following list of not successful procedures during the startup: UnassignProcedure pid=766, ppid=754 MoveRegionProcedure pid=754, ppid=749 ModifyTableProcedure pid=749 Once master got online, UnassignProcedure started: {noformat} 2018-05-15 21:48:23,787 INFO [PEWorker-8] assignment.RegionStateStore: pid=766 updating hbase:meta row=92e0d39ee7e6d19566c393bae58ab5c0, regionState=CLOSING 2018-05-15 21:48:23,820 INFO [PEWorker-8] assignment.RegionTransitionProcedure: Dispatch pid=766, ppid=754, state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure table=IntegrationTestBigLinkedList, region=92e0d39ee7e6d19566c393bae58ab5c0, server=ctr-e138-15181 {noformat} Not sure what exactly triggered that (any retry logic from the client? I don't see any logs from the chaos monkey for that action), but suddenly master is trying to execute an another ModifyTableProcedure. And it started from the state MODIFY_TABLE_REOPEN_ALL_REGIONS which tries to reopen all regions using MoveRegionProcedure. And because our region is not online, it fails due that checkOnline: {noformat} 2018-05-15 21:48:29,684 INFO [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 2018-05-15 21:48:29,685 INFO [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] compress.CodecPool: Got brand-new compressor [.bz2] 2018-05-15 21:48:29,744 INFO [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] compress.CodecPool: Got brand-new compressor [.lz4] 2018-05-15 21:48:29,788 INFO [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] master.HMaster: Client=hbase//172.27.86.70 modify IntegrationTestBigLinkedList 2018-05-15 21:48:29,902 DEBUG [RpcServer.default.FPBQ.Fifo.handler=27,queue=0,port=2] procedure2.ProcedureExecutor: Stored pid=790, state=RUNNABLE:MODIFY_TABLE_PREPARE; ModifyTableProcedure table=IntegrationTestBigLinkedList 2018-05-15 21:48:29,969 DEBUG [PEWorker-4] util.FSTableDescriptors: Wrote into hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.06 2018-05-15 21:48:29,972 DEBUG [PEWorker-4] util.FSTableDescriptors: Deleted hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.05 2018-05-15 21:48:29,972 INFO [PEWorker-4] util.FSTableDescriptors: Updated tableinfo=hdfs://mycluster/apps/hbase/data/data/default/IntegrationTestBigLinkedList/.tabledesc/.tableinfo.06 2018-05-15 21:48:30,004 WARN [PEWorker-4] procedure.ModifyTableProcedure: Retriable error trying to modify table=IntegrationTestBigLinkedList (in state=MODIFY_TABLE_REOPEN_ALL_REGIONS) org.apache.hadoop.hbase.client.DoNotRetryRegionException: 92e0d39ee7e6d19566c393bae58ab5c0 is not OPEN at org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:193) at org.apache.hadoop.hbase.master.assignment.MoveRegionProcedure.(MoveRegionProcedure.java:67) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createMoveRegionProcedure(AssignmentManager.java:767) at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createReopenProcedures(AssignmentManager.java:705) at org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:128) at org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.executeFromState(ModifyTableProcedure.java:50) at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184) at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1472) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1240) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75) at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1760) 2018-05-15 21:48:30,027 WARN [PEWorker-4] procedure.ModifyTableProcedure: Retriable error trying to modify
[jira] [Commented] (HBASE-20202) [AMv2] Don't move region if its a split parent or offlined
[ https://issues.apache.org/jira/browse/HBASE-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16477967#comment-16477967 ] Sergey Soldatov commented on HBASE-20202: - [~stack] In your fix you have added a check that a region is open to the MoveRegionProcedure constructor: {noformat} public MoveRegionProcedure(final MasterProcedureEnv env, final RegionPlan plan) throws HBaseIOException { super(env, plan.getRegionInfo()); this.plan = plan; preflightChecks(env, true); checkOnline(env, plan.getRegionInfo()); } {noformat} What will happen if master crashed during UnassignProcedure execution and region went to close state? Recently we had a case when new master was trying to recover from the crash and got stuck because the region was in CLOSED state and this check prevented to create the procedure. FYI [~elserj] > [AMv2] Don't move region if its a split parent or offlined > -- > > Key: HBASE-20202 > URL: https://issues.apache.org/jira/browse/HBASE-20202 > Project: HBase > Issue Type: Sub-task > Components: amv2 >Affects Versions: 2.0.0-beta-2 >Reporter: stack >Assignee: stack >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20202.branch-2.001.patch, > HBASE-20202.branch-2.002.patch, HBASE-20202.branch-2.003.patch, > HBASE-20202.branch-2.003.patch > > > Found this one running ITBLLs. We'd just finished splitting a region > 91655de06786f786b0ee9c51280e1ee6 and then a move for it comes in. The move > fails in an interesting way. The location has been removed from the > regionnode kept by the Master. HBASE-20178 adds macro checks on context. Need > to add a few checks to the likes of MoveRegionProcedure so we don't try to > move an offlined/split parent. > {code} > 2018-03-14 10:21:45,678 INFO [PEWorker-2] procedure2.ProcedureExecutor: > Finished pid=3177, state=SUCCESS; SplitTableRegionProcedure > table=IntegrationTestBigLinkedList, parent=91655de06786f786b0ee9c51280e1ee6, > daughterA=b67bf6b79eaa83de788b0519f782ce8e, > daughterB=99cf6ddb38cad08e3aa7635b6cac2e7b in 10.0210sec > 2018-03-14 10:21:45,679 INFO [PEWorker-15] > procedure.MasterProcedureScheduler: pid=3194, ppid=3193, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, > server=ve0530.halxg.cloudera.com,16020,1521007509855, > IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0. > 2018-03-14 10:21:45,680 INFO [PEWorker-5] > procedure.MasterProcedureScheduler: pid=3187, > state=RUNNABLE:MOVE_REGION_UNASSIGN; MoveRegionProcedure > hri=IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6., > source=ve0530.halxg.cloudera.com,16020,1521007509855, > destination=ve0528.halxg.cloudera.com,16020,1521047890874, > IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6. > 2018-03-14 10:21:45,680 INFO [PEWorker-15] assignment.RegionStateStore: > pid=3194 updating hbase:meta > row=IntegrationTestBigLinkedList,\xAA\xAA\xAA\xAA\xAA\xAA\xAA\xA0,1521047891276.af198ca64b196fb3d2f5b3e815b2dad0., > regionState=CLOSING > 2018-03-14 10:21:45,680 INFO [PEWorker-5] procedure2.ProcedureExecutor: > Initialized subprocedures=[{pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855}] > 2018-03-14 10:21:45,683 INFO [PEWorker-15] > assignment.RegionTransitionProcedure: Dispatch pid=3194, ppid=3193, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=af198ca64b196fb3d2f5b3e815b2dad0, > server=ve0530.halxg.cloudera.com,16020,1521007509855; rit=CLOSING, > location=ve0530.halxg.cloudera.com,16020,1521007509855 > 2018-03-14 10:21:45,752 INFO [PEWorker-15] > procedure.MasterProcedureScheduler: pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855, > IntegrationTestBigLinkedList,\x0C0\xC3\x0C0\xC3\x0C0,1521045713137.91655de06786f786b0ee9c51280e1ee6. > 2018-03-14 10:21:45,753 ERROR [PEWorker-15] procedure2.ProcedureExecutor: > CODE-BUG: Uncaught runtime exception: pid=3195, ppid=3187, > state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure > table=IntegrationTestBigLinkedList, region=91655de06786f786b0ee9c51280e1ee6, > server=ve0530.halxg.cloudera.com,16020,1521007509855 > java.lang.NullPointerException
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380788#comment-16380788 ] Sergey Soldatov commented on HBASE-19863: - [~elserj] well, my changes are quite simple. Was: a method that had *hcd.setBloomFilterType(BloomType.NONE);* Now: a new method that has *hcd.setBloomFilterType(type);* with an additional parameter for bloom type and *this* method we are calling from our test. And the old method is just calling the new one with BloomType.NONE as the parameter. So, there are no changes in the behavior. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, > HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380728#comment-16380728 ] Sergey Soldatov commented on HBASE-19863: - [~ram_krish] actually we were setting it to none by default : {noformat} - // Disable blooms (they are on by default as of 0.95) but we disable them here because - // tests have hard coded counts of what to expect in block cache, etc., and blooms being - // on is interfering. - builder.addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(family) - .setBloomFilterType(BloomType.NONE) - .build()); {noformat} That's the part of the code that this method had previously. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, > HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v5-branch-1.4.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.4.patch, > HBASE-19863.v5-branch-1.patch, HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v5-branch-1.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-1.patch, > HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377500#comment-16377500 ] Sergey Soldatov commented on HBASE-19863: - [~yuzhih...@gmail.com] addressed your comments in v5. [~chia7712] Sure will do it today. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v5-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch, HBASE-19863.v5-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell =
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v4-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: (was: HBASE-19863.v4-branch-2.patch) > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { >
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375057#comment-16375057 ] Sergey Soldatov commented on HBASE-19863: - [~yuzhih...@gmail.com] Ough. Didn't notice that. Reattached. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException {
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375017#comment-16375017 ] Sergey Soldatov commented on HBASE-19863: - That's exactly what I have in the patch {noformat} +@Category({ RegionServerTests.class, FilterTests.class, MediumTests.class }) +public class TestIsDeleteFailure { + private static final HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); + + @ClassRule + public static final HBaseClassTestRule CLASS_RULE = + HBaseClassTestRule.forClass(TestIsDeleteFailure.class); + {noformat} Both patches for branch-2 and master are the same, but this failure happen only on branch-2 and not reproducible locally. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374898#comment-16374898 ] Sergey Soldatov commented on HBASE-19863: - [~yuzhih...@gmail.com] that looks like a flaky test and failure is not related to the patch. Passed locally and as we can see passed for branch-2 precommit test. I more worry about the failure for branch-2 patch. Don't understand where that assert error comes from {noformat} java.lang.AssertionError: No HBaseClassTestRule ClassRule for org.apache.hadoop.hbase.regionserver.TestIsDeleteFailure {noformat} > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v4-master.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch, > HBASE-19863.v4-master.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v4-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch, HBASE-19863.v4-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey =
[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367941#comment-16367941 ] Sergey Soldatov commented on HBASE-19805: - [~stack] Sorry, missed the previous notification. Last time I ended up with some weird scenarios where everything get stuck without any visible problems and didn't have a chance to dig it due $dayjob. Will try to get back to this stuff during the weekend. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while (numRegions == newRegionSize) { > Thread.sleep(50); > newRegionSize = admin.getRegions(tn).size(); > } > } > {code} > A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame. > {code} > static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv > env, > final ServerName regionLocation, final RegionInfo hri, boolean > includeBestSplitRow) > throws IOException { > // TODO: There is no timeout on this controller. Set one! > HBaseRpcController controller = > env.getMasterServices().getClusterConnection(). >
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366693#comment-16366693 ] Sergey Soldatov commented on HBASE-19863: - [~chia7712] I've moved the test to a separate file. I don't think it affects trySkipToNextRow. This problem happens inside a single row only, so even if we jump back when getting a new top, we will just iterate using next() to the next row anyway... As for changing realSeekDone to false, I was thinking that it was done especially to avoid some seeks (not only the case we are discussing). So that kind of changes may cause a performance degradation for some cases? > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v3-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch, > HBASE-19863.v3-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey !=
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363718#comment-16363718 ] Sergey Soldatov commented on HBASE-19863: - [~yuzhih...@gmail.com] you mean checkstyle? Addressed that in v2 patch. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey =
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863.v2-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch, HBASE-19863.v2-branch-2.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != >
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863-branch-2.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch-2.patch, HBASE-19863-branch1.patch, > HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY >
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361916#comment-16361916 ] Sergey Soldatov commented on HBASE-19863: - [~ram_krish] thank you, sir! I will prepare the final patch tomorrow (it would require some changes in test harness to make bloom filter configurable). > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355021#comment-16355021 ] Sergey Soldatov commented on HBASE-19863: - [~anoop.hbase] thank you, sir! The question is whether it's ok to do seek if during the heap.next() we choose another scanner which is actually 'lag' behind (for example cur points to C10, but hfs is still pointing to C1). If we try to skip here we may break the order (so, at the moment we are at C10, but during the next instead of C11 we will get C2. I tried to find a better place to fix it but didn't find any. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. >
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352674#comment-16352674 ] Sergey Soldatov commented on HBASE-19863: - bq. I agree to this. But my arg would be that if getNextIndexedKey() is a key with the next row It's usually a key with the next row. And in this case, the (1) check just return the value greater than zero, so we will perform next() once. After that, in (2) we just check that we are still in the same column and if not we return true, avoiding the call of reseek. bq. Expectation is for compareKeyForNextColumn() check with nextcolumn. According to the method name it's true, but according to the comments this is just a "result of the compare between the indexed key and the key portion of the passed cell". I believe that all this logic fits [~lhofhansl] optimization for SEEK vs SKIP that is implemented in HBASE-13109 > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will
[jira] [Comment Edited] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345868#comment-16345868 ] Sergey Soldatov edited comment on HBASE-19863 at 1/30/18 10:07 PM: --- [~ram_krish] That's an interesting question what trySkipToNextColumn is supposed to do. If I understand correctly it's just trying to really skip to the next column in the row without reseek. There are two checks: 1. {noformat} matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0 {noformat} So the reason for this check is only to check that we are still in the same block, so no need for reseek 2. {noformat} CellUtil.matchingRowColumn(cell, nextCell) {noformat} That would work when we iterate through versions of the same cell. Based on those two conditions we decide whether we need to reseek (at the moment only if we reached the block end or there are no indexed keys at all), or just continue to iterate in StoreScanner.next with next column. So, my fix just handles the case when heap.next() change the top scanner which lags behind, so the second check thinks that we on the next cell, but actually we just jumped back and need to perform reseek. And yes, the original description is just a corner case of the wider problem when we change our top scanner to the one which skipped last reseek due to optimization with fake KVs. was (Author: sergey.soldatov): [~ram_krish] That's an interesting question what trySkipToNextColumn is supposed to do. If I understand correctly it's just tries to really skip to the next column in the row without reseek. There are two checks: 1. {noformat} matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0 {noformat} So the reason of this check is only to check that we are still in the same block, so no need for reseek 2. {noformat} CellUtil.matchingRowColumn(cell, nextCell) {noformat} That would work when we iterate through versions of the same cell. Based on those two conditions we decide whether we need to reseek (at the moment only if we reached the block end or there is no indexed keys at all), or just continue to iterate in StoreScanner.next with new cell. So, my fix is just handle the case when heap.next() change the top scanner which lags behind, so second check thinks that we on the next cell, but actually we just jumped back and need to perform reseek. And yes, the original description is just a corner case of the wider problem when we change our top scanner to the one which skipped last reseek due optimization with fake KVs. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345868#comment-16345868 ] Sergey Soldatov commented on HBASE-19863: - [~ram_krish] That's an interesting question what trySkipToNextColumn is supposed to do. If I understand correctly it's just tries to really skip to the next column in the row without reseek. There are two checks: 1. {noformat} matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0 {noformat} So the reason of this check is only to check that we are still in the same block, so no need for reseek 2. {noformat} CellUtil.matchingRowColumn(cell, nextCell) {noformat} That would work when we iterate through versions of the same cell. Based on those two conditions we decide whether we need to reseek (at the moment only if we reached the block end or there is no indexed keys at all), or just continue to iterate in StoreScanner.next with new cell. So, my fix is just handle the case when heap.next() change the top scanner which lags behind, so second check thinks that we on the next cell, but actually we just jumped back and need to perform reseek. And yes, the original description is just a corner case of the wider problem when we change our top scanner to the one which skipped last reseek due optimization with fake KVs. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Status: Patch Available (was: Open) A WIP patch to deal with this issue. If during the attempt of skip to next column we hit the case when store scanner was changed and new head is behind the cell we are looking for, we just return false to force reseek in seekOrSkipToNextColumn. Not sure yet whether it covers all possible scenarios. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: >
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863-branch1.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-branch1.patch, HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY > &&
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343845#comment-16343845 ] Sergey Soldatov commented on HBASE-19863: - bq. In this place we try to do a real seek right before actually using that scanner? I remember this to happen if we are in a fake key. There are some exceptions. StoreScanner.trySkipToNextColumn as an example. If we have indexed keys, we just iterate through the heap without reseek (which may happen in seekOrSkipToNextColumn if trySkipToNextColumn return false). > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn
[jira] [Commented] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16343811#comment-16343811 ] Sergey Soldatov commented on HBASE-19863: - Attached a test case to reproduce the problem for branch-1. It has some changes in HBaseTestingUtility.createTable to make required conditions (bloom filter as well as block size). Please let me know whether it helps to reproduce the problem. > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException {
[jira] [Updated] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
[ https://issues.apache.org/jira/browse/HBASE-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19863: Attachment: HBASE-19863-test.patch > java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter > is used > - > > Key: HBASE-19863 > URL: https://issues.apache.org/jira/browse/HBASE-19863 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 1.4.1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov >Priority: Major > Attachments: HBASE-19863-test.patch > > > Under some circumstances scan with SingleColumnValueFilter may fail with an > exception > {noformat} > java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, > qualifier=C2, timestamp=1516433595543, comparison result: 1 > at > org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) > at > org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) > at > org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) > {noformat} > Conditions: > table T with a single column family 0 that uses ROWCOL bloom filter > (important) and column qualifiers C1,C2,C3,C4,C5. > When we fill the table for every row we put deleted cell for C3. > The table has a single region with two HStore: > A: start row: 0, stop row: 99 > B: start row: 10 stop row: 99 > B has newer versions of rows 10-99. Store files have several blocks each > (important). > Store A is the result of major compaction, so it doesn't have any deleted > cells (important). > So, we are running a scan like: > {noformat} > scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter > ('0','C5',=,'binary:whatever')"} > {noformat} > How the scan performs: > First, we iterate A for rows 0 and 1 without any problems. > Next, we start to iterate A for row 10, so read the first cell and set hfs > scanner to A : > 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : > 10:0/C1/1/Put/x, > so we make B as our current store scanner. Since we are looking for > particular columns > C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn > which > would run reseek for all store scanners. > For store A the following magic would happen in requestSeek: > 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to > false because row 10 doesn't have C3 qualifier in store A. > 2. Since we don't have to seek we just create a fake row > 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for > us and it commented with : > {noformat} > // Multi-column Bloom filter optimization. > // Create a fake key/value, so that this scanner only bubbles up to the > top > // of the KeyValueHeap in StoreScanner after we scanned this row/column in > // all other store files. The query matcher will then just skip this fake > // key/value and the store scanner will progress to the next column. This > // is obviously not a "real real" seek, but unlike the fake KV earlier in > // this method, we want this to be propagated to ScanQueryMatcher. > {noformat} > > For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum > to skip C3 entirely. > After that we start searching for qualifier C5 using seekOrSkipToNextColumn > which run first trySkipToNextColumn: > {noformat} > protected boolean trySkipToNextColumn(Cell cell) throws IOException { > Cell nextCell = null; > do { > Cell nextIndexedKey = getNextIndexedKey(); > if (nextIndexedKey != null && nextIndexedKey != > KeyValueScanner.NO_NEXT_INDEXED_KEY > && matcher.compareKeyForNextColumn(nextIndexedKey,
[jira] [Created] (HBASE-19863) java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used
Sergey Soldatov created HBASE-19863: --- Summary: java.lang.IllegalStateException: isDelete failed when SingleColumnValueFilter is used Key: HBASE-19863 URL: https://issues.apache.org/jira/browse/HBASE-19863 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 1.4.1 Reporter: Sergey Soldatov Assignee: Sergey Soldatov Under some circumstances scan with SingleColumnValueFilter may fail with an exception {noformat} java.lang.IllegalStateException: isDelete failed: deleteBuffer=C3, qualifier=C2, timestamp=1516433595543, comparison result: 1 at org.apache.hadoop.hbase.regionserver.ScanDeleteTracker.isDeleted(ScanDeleteTracker.java:149) at org.apache.hadoop.hbase.regionserver.ScanQueryMatcher.match(ScanQueryMatcher.java:386) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:545) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5876) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:6027) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5814) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2552) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) {noformat} Conditions: table T with a single column family 0 that uses ROWCOL bloom filter (important) and column qualifiers C1,C2,C3,C4,C5. When we fill the table for every row we put deleted cell for C3. The table has a single region with two HStore: A: start row: 0, stop row: 99 B: start row: 10 stop row: 99 B has newer versions of rows 10-99. Store files have several blocks each (important). Store A is the result of major compaction, so it doesn't have any deleted cells (important). So, we are running a scan like: {noformat} scan 'T', { COLUMNS => ['0:C3','0:C5'], FILTER => "SingleColumnValueFilter ('0','C5',=,'binary:whatever')"} {noformat} How the scan performs: First, we iterate A for rows 0 and 1 without any problems. Next, we start to iterate A for row 10, so read the first cell and set hfs scanner to A : 10:0/C1/0/Put/x but found that we have a newer version of the cell in B : 10:0/C1/1/Put/x, so we make B as our current store scanner. Since we are looking for particular columns C3 and C5, we perform the optimization StoreScanner.seekOrSkipToNextColumn which would run reseek for all store scanners. For store A the following magic would happen in requestSeek: 1. bloom filter check passesGeneralBloomFilter would set haveToSeek to false because row 10 doesn't have C3 qualifier in store A. 2. Since we don't have to seek we just create a fake row 10:0/C3/OLDEST_TIMESTAMP/Maximum, an optimization that is quite important for us and it commented with : {noformat} // Multi-column Bloom filter optimization. // Create a fake key/value, so that this scanner only bubbles up to the top // of the KeyValueHeap in StoreScanner after we scanned this row/column in // all other store files. The query matcher will then just skip this fake // key/value and the store scanner will progress to the next column. This // is obviously not a "real real" seek, but unlike the fake KV earlier in // this method, we want this to be propagated to ScanQueryMatcher. {noformat} For store B we would set it to fake 10:0/C3/createFirstOnRowColTS()/Maximum to skip C3 entirely. After that we start searching for qualifier C5 using seekOrSkipToNextColumn which run first trySkipToNextColumn: {noformat} protected boolean trySkipToNextColumn(Cell cell) throws IOException { Cell nextCell = null; do { Cell nextIndexedKey = getNextIndexedKey(); if (nextIndexedKey != null && nextIndexedKey != KeyValueScanner.NO_NEXT_INDEXED_KEY && matcher.compareKeyForNextColumn(nextIndexedKey, cell) >= 0) { this.heap.next(); ++kvsScanned; } else { return false; } } while ((nextCell = this.heap.peek()) != null && CellUtil.matchingRowColumn(cell, nextCell)); return true; } {noformat} If store has several blocks than nextIndexedKey would be not null and compareKeyForNextColumn wouldn't be negative, so we try to search forward until we index or end of the row. But in this.heap.next(), the scanner for A bubbles
[jira] [Commented] (HBASE-19774) incorrect behavior of locateRegionInMeta
[ https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331762#comment-16331762 ] Sergey Soldatov commented on HBASE-19774: - [~mdrob] That's the interesting part. This incorrect behavior exists for many releases already. As I already mentioned earlier the exception would be still TNFE but with a missleading message. In 1.x "A not found, got B". In 2.x that would be very scary message about corrupted meta. And actually, that was revealed because of HBASE-19775 when shell received TNFE wrapped by UncheckedIOException, so it was unable to properly handle it. > incorrect behavior of locateRegionInMeta > > > Key: HBASE-19774 > URL: https://issues.apache.org/jira/browse/HBASE-19774 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-1 >Reporter: Romil Choksi >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19774-branch-2.patch, > HBASE-19774-wip-branch-2.patch, HBASE-19774.master.patch > > > When we try to operate with not existing table, in some circumstances we get > an incorrect report about the not existing table: > {noformat} > ERROR: Region of > 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is > expected in the table of 'nonExistentUsertable', but hbase:meta says it is in > the table of 'hbase:namespace'. hbase:meta might be damaged. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19774) incorrect behavior of locateRegionInMeta
[ https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19774: Attachment: HBASE-19774.master.patch > incorrect behavior of locateRegionInMeta > > > Key: HBASE-19774 > URL: https://issues.apache.org/jira/browse/HBASE-19774 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-1 >Reporter: Romil Choksi >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19774-branch-2.patch, > HBASE-19774-wip-branch-2.patch, HBASE-19774.master.patch > > > When we try to operate with not existing table, in some circumstances we get > an incorrect report about the not existing table: > {noformat} > ERROR: Region of > 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is > expected in the table of 'nonExistentUsertable', but hbase:meta says it is in > the table of 'hbase:namespace'. hbase:meta might be damaged. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19774) incorrect behavior of locateRegionInMeta
[ https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331558#comment-16331558 ] Sergey Soldatov commented on HBASE-19774: - Attached the patch with a test that covers different table names. > incorrect behavior of locateRegionInMeta > > > Key: HBASE-19774 > URL: https://issues.apache.org/jira/browse/HBASE-19774 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-1 >Reporter: Romil Choksi >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19774-branch-2.patch, > HBASE-19774-wip-branch-2.patch > > > When we try to operate with not existing table, in some circumstances we get > an incorrect report about the not existing table: > {noformat} > ERROR: Region of > 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is > expected in the table of 'nonExistentUsertable', but hbase:meta says it is in > the table of 'hbase:namespace'. hbase:meta might be damaged. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HBASE-19774) incorrect behavior of locateRegionInMeta
[ https://issues.apache.org/jira/browse/HBASE-19774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19774: Attachment: HBASE-19774-branch-2.patch > incorrect behavior of locateRegionInMeta > > > Key: HBASE-19774 > URL: https://issues.apache.org/jira/browse/HBASE-19774 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-1 >Reporter: Romil Choksi >Assignee: Sergey Soldatov >Priority: Major > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19774-branch-2.patch, > HBASE-19774-wip-branch-2.patch > > > When we try to operate with not existing table, in some circumstances we get > an incorrect report about the not existing table: > {noformat} > ERROR: Region of > 'hbase:namespace,,1510363071508.0d8ddea7654f95130959218e9bc9c89c.' is > expected in the table of 'nonExistentUsertable', but hbase:meta says it is in > the table of 'hbase:namespace'. hbase:meta might be damaged. > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16328342#comment-16328342 ] Sergey Soldatov commented on HBASE-19805: - an update. An additional check for empty region location as well as for CLOSING state in checkSplittable method helps to avoid multiple splits for the same region as well as it solves this exception. But now I see that it may to get stuck somewhere during the split execution, so I'm trying to find the reason and fix it before submitting a patch. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0-beta-2 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while (numRegions == newRegionSize) { > Thread.sleep(50); > newRegionSize = admin.getRegions(tn).size(); > } > } > {code} > A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame. > {code} > static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv > env, > final ServerName regionLocation, final RegionInfo hri, boolean > includeBestSplitRow) > throws IOException { > // TODO: There is no timeout on this controller. Set one! >
[jira] [Comment Edited] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327990#comment-16327990 ] Sergey Soldatov edited comment on HBASE-19805 at 1/16/18 11:27 PM: --- well, here is a short RCA: checkSplittable() method relies on Region.isSplittable which is just a simple check that region is available (not closing nor not closed) and has no references. But HRegion.closing flag we set only when we actually execute doClose(). At first glance, it would be reasonable to add a check that the region state (RegionStateNode) is not CLOSING to checkSplittable(). was (Author: sergey.soldatov): well, here is a short RCA: checkSplittable method relies on Region.isSplittable which is just a simple check that region is available (not closing nor not closed) and has no references. But HRegion.closing flag we set only when we actually execute doClose(). At first glance, it would be reasonable to add a check that the region state (RegionStateNode) is not CLOSING. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0-beta-2 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while
[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327990#comment-16327990 ] Sergey Soldatov commented on HBASE-19805: - well, here is a short RCA: checkSplittable method relies on Region.isSplittable which is just a simple check that region is available (not closing nor not closed) and has no references. But HRegion.closing flag we set only when we actually execute doClose(). At first glance, it would be reasonable to add a check that the region state (RegionStateNode) is not CLOSING. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0-beta-2 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while (numRegions == newRegionSize) { > Thread.sleep(50); > newRegionSize = admin.getRegions(tn).size(); > } > } > {code} > A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame. > {code} > static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv > env, > final ServerName regionLocation, final RegionInfo hri, boolean > includeBestSplitRow) > throws IOException { > // TODO: There is no timeout on this
[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327901#comment-16327901 ] Sergey Soldatov commented on HBASE-19805: - [~stack] yep, that's exactly what I've expected. Let me dig why it happen on my own. [~elserj] You have an error in your code: {noformat} Thread.sleep(50); newRegionSize = admin.getRegions(tn).size(); {noformat} It should be numRegions, not newRegionSize. Otherwise this check is valid for the first split only and that's why it run splits for the same region without stop. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0-beta-2 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while (numRegions == newRegionSize) { > Thread.sleep(50); > newRegionSize = admin.getRegions(tn).size(); > } > } > {code} > A quick glance, looks like {{Util.getRegionInfoResponse}} is to blame. > {code} > static GetRegionInfoResponse getRegionInfoResponse(final MasterProcedureEnv > env, > final ServerName regionLocation, final RegionInfo hri, boolean > includeBestSplitRow) > throws IOException { > // TODO: There
[jira] [Commented] (HBASE-19805) NPE in HMaster while issuing a sequence of table splits
[ https://issues.apache.org/jira/browse/HBASE-19805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327873#comment-16327873 ] Sergey Soldatov commented on HBASE-19805: - [~stack] almost. Actually, that happens when one split is already in progress. We are closing parent (so, we are setting location to null) and if at this time we try to check whether this region is splittable, we hit this problem. I'm not sure yet why we allow to split the same region many times. From my log scheduled splits: {noformat} parent=af7ddfb3943627b825ddfe3fedb27590, daughterA=fde4b311dd76909e05cd57f2d19a8ebc, daughterB=4381a4dd9da46c6e7ce91ab6419fb708 parent=af7ddfb3943627b825ddfe3fedb27590, daughterA=38d5fffbe693017be0d2fcc97eec3e3e, daughterB=e76bfc1dddfca511c00dfd3477dc003d parent=af7ddfb3943627b825ddfe3fedb27590, daughterA=ef89483bc4117a31536e1c25def4f64e, daughterB=ccf1c2f5f43f478af438b6c3f2ca7ef5 {noformat} And only the first one is actually happen. > NPE in HMaster while issuing a sequence of table splits > --- > > Key: HBASE-19805 > URL: https://issues.apache.org/jira/browse/HBASE-19805 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 2.0.0-beta-1 >Reporter: Josh Elser >Assignee: Sergey Soldatov >Priority: Critical > Fix For: 2.0.0-beta-2 > > > I wrote a toy program to test the client tarball in HBASE-19735. After the > first few region splits, I see the following error in the Master log. > {noformat} > 2018-01-16 14:07:52,797 INFO > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] master.HMaster: > Client=jelser//192.168.1.23 split > myTestTable,1,1516129669054.8313b755f74092118f9dd30a4190ee23. > 2018-01-16 14:07:52,797 ERROR > [RpcServer.default.FPBQ.Fifo.handler=28,queue=1,port=16000] ipc.RpcServer: > Unexpected throwable object > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.ConnectionUtils.getStubKey(ConnectionUtils.java:229) > at > org.apache.hadoop.hbase.client.ConnectionImplementation.getAdmin(ConnectionImplementation.java:1175) > at > org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.getAdmin(ConnectionUtils.java:149) > at > org.apache.hadoop.hbase.master.assignment.Util.getRegionInfoResponse(Util.java:59) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:146) > at > org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:103) > at > org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:761) > at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1626) > at > org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134) > at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1618) > at > org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:778) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > {noformat} > {code} > public static void main(String[] args) throws Exception { > Configuration conf = HBaseConfiguration.create(); > try (Connection conn = ConnectionFactory.createConnection(conf); > Admin admin = conn.getAdmin()) { > final TableName tn = TableName.valueOf("myTestTable"); > if (admin.tableExists(tn)) { > admin.disableTable(tn); > admin.deleteTable(tn); > } > final TableDescriptor desc = TableDescriptorBuilder.newBuilder(tn) > > .addColumnFamily(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("f1")).build()) > .build(); > admin.createTable(desc); > List splitPoints = new ArrayList<>(16); > for (int i = 1; i <= 16; i++) { > splitPoints.add(Integer.toString(i, 16)); > } > > System.out.println("Splits: " + splitPoints); > int numRegions = admin.getRegions(tn).size(); > for (String splitPoint : splitPoints) { > System.out.println("Splitting on " + splitPoint); > admin.split(tn, Bytes.toBytes(splitPoint)); > Thread.sleep(200); > int newRegionSize = admin.getRegions(tn).size(); > while (numRegions == newRegionSize) { > Thread.sleep(50); >
[jira] [Updated] (HBASE-19775) hbase shell doesn't handle the exceptions that are wrapped in java.io.UncheckedIOException
[ https://issues.apache.org/jira/browse/HBASE-19775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Soldatov updated HBASE-19775: Attachment: HBASE-19775-2-branch-2.patch The updated version of the patch. Moved getCause before individual command handler. As for the UT - honestly speaking I'm not sure how to check that upper-level shell commands handled exception correctly. > hbase shell doesn't handle the exceptions that are wrapped in > java.io.UncheckedIOException > -- > > Key: HBASE-19775 > URL: https://issues.apache.org/jira/browse/HBASE-19775 > Project: HBase > Issue Type: Bug > Components: shell >Affects Versions: 2.0.0-beta-1 >Reporter: Sergey Soldatov >Assignee: Sergey Soldatov > Fix For: 2.0.0-beta-2 > > Attachments: HBASE-19775-2-branch-2.patch, HBASE-19775-branch-2.patch > > > HBase shell doesn't have a notion of UncheckedIOException, so it may not > handle it correctly. For an example, if we scan not existing table the error > look weird: > {noformat} > hbase(main):001:0> scan 'a' > ROW > COLUMN+CELL > ERROR: a > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)