[jira] [Resolved] (HBASE-25803) Add compaction offload switch
[ https://issues.apache.org/jira/browse/HBASE-25803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yulin Niu resolved HBASE-25803. --- Resolution: Fixed > Add compaction offload switch > - > > Key: HBASE-25803 > URL: https://issues.apache.org/jira/browse/HBASE-25803 > Project: HBase > Issue Type: Sub-task >Reporter: Yulin Niu >Assignee: Yulin Niu >Priority: Major > > Add this swith to control each regionserver whether enable compaction > offload feature. > Also, we keep a boolean value in zookeeper as cluster status, RS take this > default value when start up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25929) RegionServer JVM crash when compaction
Yi Mei created HBASE-25929: -- Summary: RegionServer JVM crash when compaction Key: HBASE-25929 URL: https://issues.apache.org/jira/browse/HBASE-25929 Project: HBase Issue Type: Bug Components: Compaction Affects Versions: 2.4.3, 2.3.5, 3.0.0-alpha-1, 2.5.0 Reporter: Yi Mei Assignee: Yi Mei Attachments: hs_err_pid27712.log, hs_err_pid28814.log In our cluster, we found region servers may be crashed in several cases. In hs_err_pid27712.log: {code:java} Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) J 2687 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x7f85c987eda7 [0x7f85c987ed40+0x67] J 5884 C1 org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V (62 bytes) @ 0x7f85c93fd904 [0x7f85c93fd780+0x184] J 4274 C1 org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V (73 bytes) @ 0x7f85c9d57a94 [0x7f85c9d574a0+0x5f4] J 5211 C2 org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V (69 bytes) @ 0x7f85ca039a34 [0x7f85ca0399a0+0x94] J 5985 C1 org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I (59 bytes) @ 0x7f85c9296a34 [0x7f85c92964c0+0x574] J 6011 C1 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 bytes) @ 0x7f85c913e094 [0x7f85c913d4c0+0xbd4] J 6004 C1 org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;Ljava/util/function/Function;)Ljava/lang/String; (211 bytes) @ 0x7f85c93737b4 [0x7f85c93722e0+0x14d4] J 6000 C1 org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;)Ljava/lang/String; (10 bytes) @ 0x7f85c9854d14 [0x7f85c9854ba0+0x174] j org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.getMidpoint(Lorg/apache/hadoop/hbase/CellComparator;Lorg/apache/hadoop/hbase/Cell;Lorg/apache/hadoop/hbase/Cell;Lorg/apache/hadoop/hbase/io/hfile/HFileContext;)Lorg/apache/hadoop/hbase/Cell;+132 j org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock()V+102 j org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.checkBlockBoundary()V+32 j org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.append(Lorg/apache/hadoop/hbase/Cell;)V+77 j org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(Lorg/apache/hadoop/hbase/Cell;)V+20 j org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$FileDetails;Lorg/apache/hadoop/hbase/regionserver/InternalScanner;Lorg/apache/hadoop/hbase/regionserver/CellSink;JZLorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;ZI)Z+318 j org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionRequestImpl;Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$InternalScannerFactory;Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$CellSinkFactory;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+221 j org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionRequestImpl;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+12 j org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+16 j org.apache.hadoop.hbase.regionserver.HStore.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionContext;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+194 {code} In hs_err_pid28814.log: {code:java} Stack: [0x7f6d8e69b000,0x7f6d8e6dc000], sp=0x7f6d8e6d9e88, free space=251k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x747fa0] J 2989 sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 bytes) @ 0x7f751db756e1 [0x7f751db75600+0xe1] j org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36 j org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69 j org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39 j org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31 J 12082 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 bytes) @ 0x7f751ef15fbc [0x7f751ef15dc0+0x1fc] J 16584 C2 org.apache.hadoop.hbase.CellUtil.getCellKeyAs
[jira] [Reopened] (HBASE-25861) Correct the usage of Configuration#addDeprecation
[ https://issues.apache.org/jira/browse/HBASE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened HBASE-25861: - Sorry to reopen. I think we need to understand the behavior change better. CC: [~snemeth] looks like we have a problem where HBase depends on the old behavior of Hadoop's Configuration class prior to HADOOP-15708. > Correct the usage of Configuration#addDeprecation > - > > Key: HBASE-25861 > URL: https://issues.apache.org/jira/browse/HBASE-25861 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0-alpha-1, 2.5.0 >Reporter: Baiqiang Zhao >Assignee: Baiqiang Zhao >Priority: Major > Fix For: 3.0.0-alpha-1, 2.5.0 > > > When I was solving HBASE-25745 > ([PR3139|https://github.com/apache/hbase/pull/3139]), I found that our use of > Configuration#addDeprecation API was wrong. > > At present, we will call Configuration#addDeprecation in the static block for > the deprecated configuration. But after testing, it is found that this does > not complete backward compatibility. When user upgrades HBase and does not > change the deprecated configuration to the new configuration, he will find > that the deprecated configuration does not effect, which may not be > consistent with expectations. The specific test results can be seen in the PR > above, and we can found the calling order of Configuration#addDeprecation is > very important. > > Configuration#addDeprecation is a Hadoop API, looking through the Hadoop > source code, we will find that before creating the Configuration object, the > addDeprecatedKeys() method will be called first: > [https://github.com/apache/hadoop/blob/b93e448f9aa66689f1ce5059f6cdce8add130457/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java#L34] > . -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25928) TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3
Wei-Chiu Chuang created HBASE-25928: --- Summary: TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3 Key: HBASE-25928 URL: https://issues.apache.org/jira/browse/HBASE-25928 Project: HBase Issue Type: Bug Affects Versions: 3.0.0-alpha-1, 2.5.0 Reporter: Wei-Chiu Chuang The test TestHBaseConfiguration#testDeprecatedConfigurations was added recently by HBASE-25861 to address the usage of Hadoop Configuration addDeprecations API. However, the API's behavior was changed to fix a bug. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Second release candidate for HBase 2.4.3 (RC1) is available
+1 (non-binding) * Signature: ok * Checksum : ok * Rat check (1.8.0_282): ok - mvn clean apache-rat:check -D hadoop.profile=3.0 * Built from source (1.8.0_282): ok - mvn clean install -D hadoop.profile=3.0 -DskipTests * Unit tests pass (1.8.0_282): ok - mvn package -P runSmallTests -D hadoop.profile=3.0 -Dsurefire.rerunFailingTestsCount=3 * Installed a single node cluster and exercised basic operations with HBase shell commands. Regards, Pankaj On Wed, May 26, 2021 at 8:20 PM Viraj Jasani wrote: > +1 > > * Signature: ok > * Checksum : ok > * Rat check (1.8.0_171): ok > - mvn clean apache-rat:check > * Built from source (1.8.0_171): ok > - mvn clean install -DskipTests > * Nightly looks good > > Brought up 8 node cluster, added 15 B rows across several tables. > No issues reported. > > > On 2021/05/20 19:10:30, Andrew Purtell wrote: > > Please vote on this Apache HBase release candidate, hbase-2.4.3RC1. > > > > The VOTE will remain open for at least 72 hours. > > > > [ ] +1 Release this package as Apache HBase 2.4.3 > > [ ] -1 Do not release this package because ... > > > > The tag to be voted on is 2.4.3RC1: > > > > https://github.com/apache/hbase/tree/2.4.3RC1 > > > > The release files, including signatures, digests, as well as CHANGES.md > > and RELEASENOTES.md included in this RC can be found at: > > > > https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/ > > > > These sources correspond with the git tag "2.4.3RC1" (401b60b217). > > > > Temporary Maven artifacts are available in the staging repository: > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1447/ > > > > Artifacts were signed with the apurt...@apache.org key which can be > found > > in: > > > > https://dist.apache.org/repos/dist/release/hbase/KEYS > > > > The API compatibility report for this RC can be found at: > > > > > > > https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/api_compare_2.4.2_to_2.4.3RC1.html > > > > We performed the following successful pre-flight checks before > > announcing the previous RC, RC0: > > > > - Unit tests > > > > - 10 TB Common Crawl data load via IntegrationTestLoadCommonCrawl, > > slowDeterministic policy > > > > To learn more about Apache HBase, please see > > > > http://hbase.apache.org/ > > > > Thanks, > > Your HBase Release Manager > > >
Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions
Thanks Stack! (access given, as google probably told you already). Please keep me honest. On 5/26/21 12:29 PM, Stack wrote: And, what is there currently is a nice write-up S On Wed, May 26, 2021 at 9:26 AM Stack wrote: Can I have comment access please Josh? S On Tue, May 25, 2021 at 8:24 PM Josh Elser wrote: Hi folks, This is a follow-on for the HBASE-24749 discussion on storefile tracking, specifically focusing on where/how do we store the list of files for each Store. I tried to capture my thoughts and the suggestions by Duo and Wellington in this google doc [1]. Please feel free to ask for edit permission (and send me a note if your email address isn't one that I would otherwise recognize :) ) to correct, improve, or expand on any other sections. FWIW, I was initially not super excited about a per-Store file, but, the more I think about it, the more I'm coming around to that idea. I think it will be more "exception-handling", but avoid the long-term operational burden of yet-another-important-system-table. - Josh [1] https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
[jira] [Resolved] (HBASE-25907) Move StoreFlushContext out of HStore and make it pluggable
[ https://issues.apache.org/jira/browse/HBASE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil resolved HBASE-25907. -- Resolution: Fixed > Move StoreFlushContext out of HStore and make it pluggable > -- > > Key: HBASE-25907 > URL: https://issues.apache.org/jira/browse/HBASE-25907 > Project: HBase > Issue Type: Sub-task >Affects Versions: 3.0.0-alpha-1 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Major > > Currently, StoreFlushContext is directly implemented and instantiated inside > HStore class. This implementation assumes hfiles are always flushed into temp > dir first, and its commit implementation moves these files into the actual > family dir. In order to allow for the direct flushes (no temp, nor renames), > we need to make StoreFlushContext implementations pluggable in HStore. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25927) Fix the log messages by not stringifying the exceptions in log
Sandeep Pal created HBASE-25927: --- Summary: Fix the log messages by not stringifying the exceptions in log Key: HBASE-25927 URL: https://issues.apache.org/jira/browse/HBASE-25927 Project: HBase Issue Type: Bug Reporter: Sandeep Pal Assignee: Sandeep Pal There are few places where we stringify the exceptions and log, instead we should just pass them as a parameter to see the stack trace in good format. For example: https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L175 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions
And, what is there currently is a nice write-up S On Wed, May 26, 2021 at 9:26 AM Stack wrote: > Can I have comment access please Josh? > S > > On Tue, May 25, 2021 at 8:24 PM Josh Elser wrote: > >> Hi folks, >> >> This is a follow-on for the HBASE-24749 discussion on storefile >> tracking, specifically focusing on where/how do we store the list of >> files for each Store. >> >> I tried to capture my thoughts and the suggestions by Duo and Wellington >> in this google doc [1]. >> >> Please feel free to ask for edit permission (and send me a note if your >> email address isn't one that I would otherwise recognize :) ) to >> correct, improve, or expand on any other sections. >> >> FWIW, I was initially not super excited about a per-Store file, but, the >> more I think about it, the more I'm coming around to that idea. I think >> it will be more "exception-handling", but avoid the long-term >> operational burden of yet-another-important-system-table. >> >> - Josh >> >> [1] >> >> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing >> >
Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions
Can I have comment access please Josh? S On Tue, May 25, 2021 at 8:24 PM Josh Elser wrote: > Hi folks, > > This is a follow-on for the HBASE-24749 discussion on storefile > tracking, specifically focusing on where/how do we store the list of > files for each Store. > > I tried to capture my thoughts and the suggestions by Duo and Wellington > in this google doc [1]. > > Please feel free to ask for edit permission (and send me a note if your > email address isn't one that I would otherwise recognize :) ) to > correct, improve, or expand on any other sections. > > FWIW, I was initially not super excited about a per-Store file, but, the > more I think about it, the more I'm coming around to that idea. I think > it will be more "exception-handling", but avoid the long-term > operational burden of yet-another-important-system-table. > > - Josh > > [1] > > https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing >
[jira] [Created] (HBASE-25926) Cleanup MetaTableAccessor references in FavoredNodeBalancer related code
Duo Zhang created HBASE-25926: - Summary: Cleanup MetaTableAccessor references in FavoredNodeBalancer related code Key: HBASE-25926 URL: https://issues.apache.org/jira/browse/HBASE-25926 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang Actually we do not need to use MetaTableAccessor here, and we do not need to put region info when updating favored node. And also the tests need some improvements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25904) Client integration test is failing on master and branch-2
[ https://issues.apache.org/jira/browse/HBASE-25904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk resolved HBASE-25904. -- Resolution: Fixed > Client integration test is failing on master and branch-2 > - > > Key: HBASE-25904 > URL: https://issues.apache.org/jira/browse/HBASE-25904 > Project: HBase > Issue Type: Task > Components: integration tests >Reporter: Duo Zhang >Assignee: Nick Dimiduk >Priority: Major > > {noformat} > Starting up HBase > 127.0.0.1: Host key verification failed. > running master, logging to > /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/hbase-install/bin/../logs/hbase-jenkins-master-jenkins-hbase7.out > 2021-05-22T04:16:40,142 INFO [main] master.HMaster: STARTING service HMaster > 2021-05-22T04:16:40,149 INFO [main] util.VersionInfo: HBase 3.0.0-SNAPSHOT > 2021-05-22T04:16:40,149 INFO [main] util.VersionInfo: Source code repository > file:///home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/unpacked_src_tarball > revision=Unknown > 2021-05-22T04:16:40,149 INFO [main] util.VersionInfo: Compiled by jenkins on > Sat May 22 04:07:41 UTC 2021 > 2021-05-22T04:16:40,149 INFO [main] util.VersionInfo: From source with > checksum > b6959885410c34f4458efd580213907ca50e84a0a900ea1d465d04a1a9480e520419d51da933547e3f517850a2913a8e8aefbcd6ba9e12589fced980910fb941 > cat: > /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/output-integration/hadoop-3/hbase-conf//regionservers: > No such file or directory > cat: > /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/output-integration/hadoop-3/hbase-conf//regionservers: > No such file or directory > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > retry waiting for hbase to come up. > /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/component/dev-support/hbase_nightly_pseudo-distributed-test.sh: > line 539: 3122 Terminated sleep "${sleep_time}" > Shutting down HBase > no hbase master found > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Second release candidate for HBase 2.4.3 (RC1) is available
+1 * Signature: ok * Checksum : ok * Rat check (1.8.0_171): ok - mvn clean apache-rat:check * Built from source (1.8.0_171): ok - mvn clean install -DskipTests * Nightly looks good Brought up 8 node cluster, added 15 B rows across several tables. No issues reported. On 2021/05/20 19:10:30, Andrew Purtell wrote: > Please vote on this Apache HBase release candidate, hbase-2.4.3RC1. > > The VOTE will remain open for at least 72 hours. > > [ ] +1 Release this package as Apache HBase 2.4.3 > [ ] -1 Do not release this package because ... > > The tag to be voted on is 2.4.3RC1: > > https://github.com/apache/hbase/tree/2.4.3RC1 > > The release files, including signatures, digests, as well as CHANGES.md > and RELEASENOTES.md included in this RC can be found at: > > https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/ > > These sources correspond with the git tag "2.4.3RC1" (401b60b217). > > Temporary Maven artifacts are available in the staging repository: > > https://repository.apache.org/content/repositories/orgapachehbase-1447/ > > Artifacts were signed with the apurt...@apache.org key which can be found > in: > > https://dist.apache.org/repos/dist/release/hbase/KEYS > > The API compatibility report for this RC can be found at: > > > https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/api_compare_2.4.2_to_2.4.3RC1.html > > We performed the following successful pre-flight checks before > announcing the previous RC, RC0: > > - Unit tests > > - 10 TB Common Crawl data load via IntegrationTestLoadCommonCrawl, > slowDeterministic policy > > To learn more about Apache HBase, please see > > http://hbase.apache.org/ > > Thanks, > Your HBase Release Manager >
[jira] [Created] (HBASE-25925) FavoredNodeBalancer related code refactoring and improvement
Duo Zhang created HBASE-25925: - Summary: FavoredNodeBalancer related code refactoring and improvement Key: HBASE-25925 URL: https://issues.apache.org/jira/browse/HBASE-25925 Project: HBase Issue Type: Umbrella Reporter: Duo Zhang Will do some code refactoring first before actually moving it to hbase-balancer in HBASE-25649, as some of the improvements can also go to branch-2. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.
Rushabh Shah created HBASE-25924: Summary: Seeing a spike in uncleanlyClosedWALs metric. Key: HBASE-25924 URL: https://issues.apache.org/jira/browse/HBASE-25924 Project: HBase Issue Type: Bug Reporter: Rushabh Shah Assignee: Rushabh Shah Getting the following log line in all of our production clusters when WALEntryStream is dequeuing WAL file. {noformat} 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - Reached the end of WAL file hdfs://. It was not closed cleanly, so we did not parse 8 bytes of data. This is normally ok. {noformat} The 8 bytes are usually the trailer size. While dequeue'ing the WAL file from WALEntryStream, we reset the reader here. [WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221] {code:java} private void tryAdvanceEntry() throws IOException { if (checkReader()) { readNextEntryAndSetPosition(); if (currentEntry == null) { // no more entries in this log file - see if log was rolled if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled // Before dequeueing, we should always get one more attempt at reading. // This is in case more entries came in after we opened the reader, // and a new log was enqueued while we were reading. See HBASE-6758 resetReader(); ---> HERE readNextEntryAndSetPosition(); if (currentEntry == null) { if (checkAllBytesParsed()) { // now we're certain we're done with this log file dequeueCurrentLog(); if (openNextLog()) { readNextEntryAndSetPosition(); } } } } // no other logs, we've simply hit the end of the current open log. Do nothing } } // do nothing if we don't have a WAL Reader (e.g. if there's no logs in queue) } {code} In resetReader, we call the following methods, WALEntryStream#resetReader > ProtobufLogReader#reset ---> ProtobufLogReader#initInternal. In ProtobufLogReader#initInternal, we try to create the whole reader object from scratch to see if any new data has been written. We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength. We calculate whether trailer is present or not depending on fileLength. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HBASE-25923) Region state stuck in PENDING_OPEN
Xiaolin Ha created HBASE-25923: -- Summary: Region state stuck in PENDING_OPEN Key: HBASE-25923 URL: https://issues.apache.org/jira/browse/HBASE-25923 Project: HBase Issue Type: Improvement Components: master, Region Assignment Affects Versions: 1.0.0 Reporter: Xiaolin Ha Assignee: Xiaolin Ha Region will not be reassigned if encounters ConnectionClosingException, and then it will stuck in PENDING_OPEN state. Error logs are as follows, {code:java} INFO [jd-data-hbase02.gh.sankuai.com,16000,1621944138744-GeneralBulkAssigner-12] master.AssignmentManager: Unable to communicate with jd-data-hbase15.gh.sankuai.com,16020,1622026221268 in order to assign regions, org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 is closing. Call id=19239, waitTime=1 at org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:289) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1270) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:25890) at org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:798) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1744) at org.apache.hadoop.hbase.master.GeneralBulkAssigner$SingleServerBulkAssigner.run(GeneralBulkAssigner.java:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 is closing. Call id=19239, waitTime=1 at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1083) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:863) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:580) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-25898) RS getting aborted due to NPE in Replication WALEntryStream
[ https://issues.apache.org/jira/browse/HBASE-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-25898. Hadoop Flags: Reviewed Resolution: Fixed Pushed to master, branch-2, branch-2.4, branch-2.3, branch-1 Thanks for the reviews. > RS getting aborted due to NPE in Replication WALEntryStream > --- > > Key: HBASE-25898 > URL: https://issues.apache.org/jira/browse/HBASE-25898 > Project: HBase > Issue Type: Bug > Components: Replication >Reporter: Anoop Sam John >Assignee: Anoop Sam John >Priority: Critical > Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4 > > > Below sequence of events happened in a customer cluster > An empty WAL file got roll req. > The close of file failed at HDFS side but as there file had all edits > synced, we continue. > New WAL file is created and old rolled. > This old WAL file got archived to oldWAL > {code} > 2021-05-13 13:38:46.000 Riding over failed WAL close of > hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678, > cause="Unexpected EOF while trying to read response from server", errors=1; > THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK > 2021-05-13 13:38:46.000 Rolled WAL > /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 > with entries=0, filesize=90 B; new WAL > /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620913126549 > 2021-05-13 13:38:46.000Archiving > hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 > to hdfs://xxx/oldWALs/xxxt%2C16020%2C1620828102351.1620910673678 > 2021-05-13 13:38:46.000 Log > hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 > was moved to hdfs://xxx/oldWALs/xxx%2C16020%2C1620828102351.1620910673678 > {code} > As there was move of file, the WALEntryStream got IOE and we will recreate > the stream . > {code} > ReplicationSourceWALReader#run > while (isReaderRunning()) { > try { > entryStream = > new WALEntryStream(logQueue, conf, currentPosition, > source.getWALFileLengthProvider(), > source.getServerWALsBelongTo(), source.getSourceMetrics(), > walGroupId); > while (isReaderRunning()) { > ... > ... > } catch (IOException e) { // stream related > if (handleEofException(e, batch)) { > sleepMultiplier = 1; > } else { > LOG.warn("Failed to read stream of replication entries", e); > if (sleepMultiplier < maxRetriesMultiplier) { > sleepMultiplier++; > } > Threads.sleep(sleepForRetries * sleepMultiplier); > } > } > {code} > eofAutoRecovery is turned off anyways. So it will go to outer while loop and > create new WALEntryStream object > Then we do readWALEntries > {code} > protected WALEntryBatch readWALEntries(WALEntryStream entryStream, > WALEntryBatch batch) throws IOException, InterruptedException { > Path currentPath = entryStream.getCurrentPath(); > if (!entryStream.hasNext()) { > {code} > Here the currentPath will be still null. > WALEntryStream#hasNext -> tryAdvanceEntry -> checkReader -> openNextLog > {code} > private boolean openNextLog() throws IOException { > PriorityBlockingQueue queue = logQueue.getQueue(walGroupId); > Path nextPath = queue.peek(); > if (nextPath != null) { > openReader(nextPath); > > private void openReader(Path path) throws IOException { > try { > // Detect if this is a new file, if so get a new reader else > // reset the current reader so that we see the new data > if (reader == null || !getCurrentPath().equals(path)) { > closeReader(); > reader = WALFactory.createReader(fs, path, conf); > seek(); > setCurrentPath(path); > } else { > resetReader(); > } > } catch (FileNotFoundException fnfe) { > handleFileNotFound(path, fnfe); > } catch (RemoteException re) { > IOException ioe = re.unwrapRemoteException(FileNotFoundException.class); > if (!(ioe instanceof FileNotFoundException)) { > throw ioe; > } > handleFileNotFound(path, (FileNotFoundException)ioe); > } catch (LeaseNotRecoveredException lnre) { > // HBASE-15019 the WAL was not closed due to some hiccup. > LOG.warn("Try to recover the WAL lease " + currentPath, lnre); > recoverLease(conf, currentPath); > reader = null; > } catch (NullPointerException npe) { > // Workaround for race condition in HDFS-4380 > // which throws a NPE if we open a file before any data node has the > most recent block > // Just sleep a
[jira] [Created] (HBASE-25922) Disabled sanity checks ignored on snapshot restore
Julian Nodorp created HBASE-25922: - Summary: Disabled sanity checks ignored on snapshot restore Key: HBASE-25922 URL: https://issues.apache.org/jira/browse/HBASE-25922 Project: HBase Issue Type: Bug Components: conf, snapshots Affects Versions: 2.4.2, 2.2.6 Environment: This has been tested in * Google Dataproc running HBase 2.2.6 * Local HBase 2.4.2 Reporter: Julian Nodorp Disabling sanity checks on a table is ignored when restoring snapshots. If this is expected behavior at least the error message is misleading. h3. Steps *to Reproduce* # Create a new table {{create 't', 'cf'}} # Add a coprocessor to the newly created table {{alter 't', METHOD => 'table_att', 'coprocessor' => 'coprocessor.jar|com.example.MyCoprocessor|0'}} # Create a snapshot {{snapshot 't', 'snapshot-t'}} # Disable the table to prevent region servers from crashing in the next step {{disable 't'}} # Delete the coprocessor JAR and restart HBase. # Attempt to restore the snapshot leads to failing sanity check as expected {{restore_snapshot 'snapshot-t'}} {{ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: coprocessor.jar Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks [...]}} # Disable sanity checks (as described in the error message) and retry {{alter 't', CONFIGURATION => \{'hbase.table.sanity.checks' => 'false'}}} {{restore_snapshot 'snapshot-t'}} h3. Expected Behavior The snapshot is restored. h3. Actual Behavior The same error message as in step 6. is shown. -- This message was sent by Atlassian Jira (v8.3.4#803005)