[jira] [Created] (HBASE-26029) It is not reliable to use nodeDeleted event to track region server's death
Duo Zhang created HBASE-26029: - Summary: It is not reliable to use nodeDeleted event to track region server's death Key: HBASE-26029 URL: https://issues.apache.org/jira/browse/HBASE-26029 Project: HBase Issue Type: Bug Reporter: Duo Zhang Assignee: Duo Zhang When implementing HBASE-26011, [~sunxin] pointed out an interesting scenario, where a region server up and down between two sync requests, then we can not know the death of the region server. This is a valid point, and when thinking of a solution, I noticed that, the current zk iplementation has the same problem. Notice that, a watcher on zk can only be triggered once, so after zk triggers the watcher, and before you set a new watcher, it is possible that a region server is up and down, and you will miss the nodeDeleted event for this region server. I think, the general approach here, which could works for both master based and zk based replication tracker is that, we should not rely on the tracker to tell you which region server is dead. Instead, we just provide the list of live regionservers, and the upper layer should compare this list with the expected list(for replication, the list should be gotten by listing replicators), to detect the dead region servers. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26020) Split TestWALEntryStream.testDifferentCounts out
[ https://issues.apache.org/jira/browse/HBASE-26020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang resolved HBASE-26020. --- Fix Version/s: 2.4.5 2.5.0 3.0.0-alpha-1 Hadoop Flags: Reviewed Resolution: Fixed Pushed to branch-2.4+. Thank [~haxiaolin] for reviewing. The code on branch-2.3 is different so not easy to cherry pick, give up for now. > Split TestWALEntryStream.testDifferentCounts out > > > Key: HBASE-26020 > URL: https://issues.apache.org/jira/browse/HBASE-26020 > Project: HBase > Issue Type: Improvement > Components: Replication, test >Reporter: Duo Zhang >Assignee: Duo Zhang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.5 > > > It consumes too much time and may cause the whole UT to timeout. > And in fact, it should be implemented as parameterized. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26028) The viewJson page show exception when using TinyLfuBlockCache
Zheng Wang created HBASE-26028: -- Summary: The viewJson page show exception when using TinyLfuBlockCache Key: HBASE-26028 URL: https://issues.apache.org/jira/browse/HBASE-26028 Project: HBase Issue Type: Bug Components: UI Reporter: Zheng Wang Assignee: Zheng Wang Some variable in TinyLfuBlockCache should be marked as transient. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26027) The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException
Zheng Wang created HBASE-26027: -- Summary: The calling of HTable.batch blocked at AsyncRequestFutureImpl.waitUntilDone caused by ArrayStoreException Key: HBASE-26027 URL: https://issues.apache.org/jira/browse/HBASE-26027 Project: HBase Issue Type: Bug Components: Client Reporter: Zheng Wang Assignee: Zheng Wang The batch api of HTable contains a param named results to store result or exception, its type is Object[]. If user pass an array with other type, eg: org.apache.hadoop.hbase.client.Result, then the ArrayStoreException will occur in AsyncRequestFutureImpl.updateResult, then the AsyncRequestFutureImpl.decActionCounter will be skipped, then in the AsyncRequestFutureImpl.waitUntilDone we will stuck at here checking the actionsInProgress again and again, can not back. It is better to add an cutoff calculated by operationTimeout, instead of only depend on the value of actionsInProgress. {code:java} [ERROR] [2021/06/22 23:23:00,676] hconnection-0x6b927fb-shared-pool3-t1 - id=1 error for test processing localhost,16020,1624343786295 java.lang.ArrayStoreException: org.apache.hadoop.hbase.DoNotRetryIOException at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.updateResult(AsyncRequestFutureImpl.java:1242) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.trySetResultSimple(AsyncRequestFutureImpl.java:1087) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.setError(AsyncRequestFutureImpl.java:1021) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.manageError(AsyncRequestFutureImpl.java:683) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.receiveGlobalFailure(AsyncRequestFutureImpl.java:716) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.access$1500(AsyncRequestFutureImpl.java:69) at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncRequestFutureImpl.java:219) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) at java.util.concurrent.FutureTask.run(FutureTask.java) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO ] [2021/06/22 23:23:10,375] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:23:20,378] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:23:30,384] main - #1, waiting for 10 actions to finish on table: [INFO ] [2021/06/22 23:23:40,387] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:23:50,397] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:24:00,400] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:24:10,408] main - #1, waiting for 10 actions to finish on table: test [INFO ] [2021/06/22 23:24:20,413] main - #1, waiting for 10 actions to finish on table: test {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26026) HBase Write may be stuck forever when using CompactingMemStore
chenglei created HBASE-26026: Summary: HBase Write may be stuck forever when using CompactingMemStore Key: HBASE-26026 URL: https://issues.apache.org/jira/browse/HBASE-26026 Project: HBase Issue Type: Bug Components: in-memory-compaction Affects Versions: 2.4.0, 2.3.0 Reporter: chenglei Sometimes I observed that HBase Write might be stuck in my hbase cluster which enabling {{CompactingMemStore}}. I have simulated the problem by unit test in my PR. The problem is caused by {{CompactingMemStore.checkAndAddToActiveSize}} : {code:java} 425 private boolean checkAndAddToActiveSize(MutableSegment currActive, Cell cellToAdd, 426 MemStoreSizing memstoreSizing) { 427if (shouldFlushInMemory(currActive, cellToAdd, memstoreSizing)) { 428 if (currActive.setInMemoryFlushed()) { 429flushInMemory(currActive); 430if (setInMemoryCompactionFlag()) { 431 // The thread is dispatched to do in-memory compaction in the background .. } {code:java} In line 427, if the sum of {{currActive.getDataSize}} adding the size of {{cellToAdd}} exceeds {{CompactingMemStore.inmemoryFlushSize}}, then {{currActive}} should be flushed, {{MutableSegment.setInMemoryFlushed()}} is invoked in above line 428 : {code:java} public boolean setInMemoryFlushed() { return flushed.compareAndSet(false, true); } {code:java} for above line 429 {{currActive.flushed}} is true, and {{CompactingMemStore.flushInMemory}} invokes {{CompactingMemStore.pushActiveToPipeline}} furthermore: {code:java} protected void pushActiveToPipeline(MutableSegment currActive) { if (!currActive.isEmpty()) { pipeline.pushHead(currActive); resetActive(); } } {code:java} For above {{CompactingMemStore.pushActiveToPipeline}} , if the {{currActive.cellSet}} is empty, then nothing is done. But due to concurrent write and because we add cell size to {{currActive.getDataSize}} and then add cell to {{currActive.cellSet}}, it is possible that {{currActive.getDataSize}} could not accommodate more cell but {{currActive.cellSet}} is empty because pending writes which not yet add cells to {{currActive.cellSet}}. So now, {{currActive.flushed}} is true,and new writes still continue target to {{currActive}}, but {{currActive}} could not enter {{flushInMemory}} again,no new active segment could be created, and in the end all writes would be stuck. In my opinion , once {{currActive.flushed}} is set true, it could not use as {{ActiveSegment}} again, and because of concurrent pending writes, only after {{currActive.updatesLock.writeLock()}} is acquired in {{CompactingMemStore.inMemoryCompaction}} ,we can safely check {{currActive}} is empty or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26019) Remove reflections used in HBaseConfiguration.getPassword()
[ https://issues.apache.org/jira/browse/HBASE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi resolved HBASE-26019. --- Resolution: Fixed > Remove reflections used in HBaseConfiguration.getPassword() > --- > > Key: HBASE-26019 > URL: https://issues.apache.org/jira/browse/HBASE-26019 > Project: HBase > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0 > > > HBaseConfiguration.getPassword() uses Hadoop API Configuration.getPassword(). > The API was added in Hadoop 2.6.0. Reflection was used to access the API. > It's time to remove the reflection and invoke the API directly. (HBase 3.0 as > well as 2.x too) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (HBASE-26019) Remove reflections used in HBaseConfiguration.getPassword()
[ https://issues.apache.org/jira/browse/HBASE-26019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Somogyi reopened HBASE-26019: --- Reopening to revert and reapply the commit. The merged commit does not contain the JIRA ID. > Remove reflections used in HBaseConfiguration.getPassword() > --- > > Key: HBASE-26019 > URL: https://issues.apache.org/jira/browse/HBASE-26019 > Project: HBase > Issue Type: Improvement >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0 > > > HBaseConfiguration.getPassword() uses Hadoop API Configuration.getPassword(). > The API was added in Hadoop 2.6.0. Reflection was used to access the API. > It's time to remove the reflection and invoke the API directly. (HBase 3.0 as > well as 2.x too) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25934) Add username for RegionScannerHolder
[ https://issues.apache.org/jira/browse/HBASE-25934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Jasani resolved HBASE-25934. -- Hadoop Flags: Reviewed Resolution: Fixed > Add username for RegionScannerHolder > > > Key: HBASE-25934 > URL: https://issues.apache.org/jira/browse/HBASE-25934 > Project: HBase > Issue Type: Wish >Reporter: tomscut >Assignee: tomscut >Priority: Minor > Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.5 > > > This JIRA[HBASE-25542|https://issues.apache.org/jira/browse/HBASE-25542] has > added part of the client information before, we can also add username for > RegionScannerHolder. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26010) Backport HBASE-25703 and HBASE-26002 to branch-2.3
[ https://issues.apache.org/jira/browse/HBASE-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki resolved HBASE-26010. -- Resolution: Won't Fix > Backport HBASE-25703 and HBASE-26002 to branch-2.3 > -- > > Key: HBASE-26010 > URL: https://issues.apache.org/jira/browse/HBASE-26010 > Project: HBase > Issue Type: Improvement > Components: backport >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 2.3.6 > > > Backport HBASE-25703 "Support conditional update in MultiRowMutationEndpoint" > and HBASE-26002 "MultiRowMutationEndpoint should return the result of the > conditional update" to branch-2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-26009) Backport HBASE-25766 "Introduce RegionSplitRestriction that restricts the pattern of the split point" to branch-2.3
[ https://issues.apache.org/jira/browse/HBASE-26009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Toshihiro Suzuki resolved HBASE-26009. -- Resolution: Won't Fix > Backport HBASE-25766 "Introduce RegionSplitRestriction that restricts the > pattern of the split point" to branch-2.3 > --- > > Key: HBASE-26009 > URL: https://issues.apache.org/jira/browse/HBASE-26009 > Project: HBase > Issue Type: Sub-task >Reporter: Toshihiro Suzuki >Assignee: Toshihiro Suzuki >Priority: Major > Fix For: 2.3.6 > > > Backport the parent issue to branch-2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-26025) Add a flag to mark if the IOError can be solved by retry in thrift IOError
Yutong Xiao created HBASE-26025: --- Summary: Add a flag to mark if the IOError can be solved by retry in thrift IOError Key: HBASE-26025 URL: https://issues.apache.org/jira/browse/HBASE-26025 Project: HBase Issue Type: Improvement Reporter: Yutong Xiao Assignee: Yutong Xiao Currently, if an HBaseIOException occurs, the thrift client can only get the error message. This is inconvenient for the client constructing a retry mechanism to handle the exception. So I added a canRetry mark in IOError to make the client side exception handling smarter. -- This message was sent by Atlassian Jira (v8.3.4#803005)