[jira] [Resolved] (HBASE-25803) Add compaction offload switch

2021-05-26 Thread Yulin Niu (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yulin Niu resolved HBASE-25803.
---
Resolution: Fixed

> Add compaction offload switch
> -
>
> Key: HBASE-25803
> URL: https://issues.apache.org/jira/browse/HBASE-25803
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Yulin Niu
>Assignee: Yulin Niu
>Priority: Major
>
> Add this swith to  control each regionserver whether enable compaction 
> offload feature.
> Also, we keep a boolean value in zookeeper as cluster status,  RS take this 
> default value when start up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25929) RegionServer JVM crash when compaction

2021-05-26 Thread Yi Mei (Jira)
Yi Mei created HBASE-25929:
--

 Summary: RegionServer JVM crash when compaction
 Key: HBASE-25929
 URL: https://issues.apache.org/jira/browse/HBASE-25929
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 2.4.3, 2.3.5, 3.0.0-alpha-1, 2.5.0
Reporter: Yi Mei
Assignee: Yi Mei
 Attachments: hs_err_pid27712.log, hs_err_pid28814.log

In our cluster, we found region servers may be crashed in several cases.

In hs_err_pid27712.log:
{code:java}
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 2687  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x7f85c987eda7 [0x7f85c987ed40+0x67]
J 5884 C1 
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V
 (62 bytes) @ 0x7f85c93fd904 [0x7f85c93fd780+0x184]
J 4274 C1 
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V (73 
bytes) @ 0x7f85c9d57a94 [0x7f85c9d574a0+0x5f4]
J 5211 C2 
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V
 (69 bytes) @ 0x7f85ca039a34 [0x7f85ca0399a0+0x94]
J 5985 C1 
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I
 (59 bytes) @ 0x7f85c9296a34 [0x7f85c92964c0+0x574]
J 6011 C1 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 
bytes) @ 0x7f85c913e094 [0x7f85c913d4c0+0xbd4]
J 6004 C1 
org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;Ljava/util/function/Function;)Ljava/lang/String;
 (211 bytes) @ 0x7f85c93737b4 [0x7f85c93722e0+0x14d4]
J 6000 C1 
org.apache.hadoop.hbase.CellUtil.getCellKeyAsString(Lorg/apache/hadoop/hbase/Cell;)Ljava/lang/String;
 (10 bytes) @ 0x7f85c9854d14 [0x7f85c9854ba0+0x174]
j  
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.getMidpoint(Lorg/apache/hadoop/hbase/CellComparator;Lorg/apache/hadoop/hbase/Cell;Lorg/apache/hadoop/hbase/Cell;Lorg/apache/hadoop/hbase/io/hfile/HFileContext;)Lorg/apache/hadoop/hbase/Cell;+132
j  org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.finishBlock()V+102
j  org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.checkBlockBoundary()V+32
j  
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl.append(Lorg/apache/hadoop/hbase/Cell;)V+77
j  
org.apache.hadoop.hbase.regionserver.StoreFileWriter.append(Lorg/apache/hadoop/hbase/Cell;)V+20
j  
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$FileDetails;Lorg/apache/hadoop/hbase/regionserver/InternalScanner;Lorg/apache/hadoop/hbase/regionserver/CellSink;JZLorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;ZI)Z+318
j  
org.apache.hadoop.hbase.regionserver.compactions.Compactor.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionRequestImpl;Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$InternalScannerFactory;Lorg/apache/hadoop/hbase/regionserver/compactions/Compactor$CellSinkFactory;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+221
j  
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionRequestImpl;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+12
j  
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+16
j  
org.apache.hadoop.hbase.regionserver.HStore.compact(Lorg/apache/hadoop/hbase/regionserver/compactions/CompactionContext;Lorg/apache/hadoop/hbase/regionserver/throttle/ThroughputController;Lorg/apache/hadoop/hbase/security/User;)Ljava/util/List;+194
{code}
In hs_err_pid28814.log:
{code:java}
Stack: [0x7f6d8e69b000,0x7f6d8e6dc000],  sp=0x7f6d8e6d9e88,  free 
space=251k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x747fa0]
J 2989  sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V (0 
bytes) @ 0x7f751db756e1 [0x7f751db75600+0xe1]
j  
org.apache.hadoop.hbase.util.UnsafeAccess.unsafeCopy(Ljava/lang/Object;JLjava/lang/Object;JJ)V+36
j  
org.apache.hadoop.hbase.util.UnsafeAccess.copy(Ljava/nio/ByteBuffer;I[BII)V+69
j  
org.apache.hadoop.hbase.util.ByteBufferUtils.copyFromBufferToArray([BLjava/nio/ByteBuffer;III)V+39
j  
org.apache.hadoop.hbase.CellUtil.copyQualifierTo(Lorg/apache/hadoop/hbase/Cell;[BI)I+31
J 12082 C2 org.apache.hadoop.hbase.ByteBufferKeyValue.getQualifierArray()[B (5 
bytes) @ 0x7f751ef15fbc [0x7f751ef15dc0+0x1fc]
J 16584 C2 
org.apache.hadoop.hbase.CellUtil.getCellKeyAs

[jira] [Reopened] (HBASE-25861) Correct the usage of Configuration#addDeprecation

2021-05-26 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HBASE-25861:
-

Sorry to reopen. I think we need to understand the behavior change better.

CC: [~snemeth] looks like we have a problem where HBase depends on the old 
behavior of Hadoop's Configuration class prior to HADOOP-15708.

> Correct the usage of Configuration#addDeprecation
> -
>
> Key: HBASE-25861
> URL: https://issues.apache.org/jira/browse/HBASE-25861
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.5.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> When I was solving HBASE-25745 
> ([PR3139|https://github.com/apache/hbase/pull/3139]), I found that our use of 
> Configuration#addDeprecation API was wrong. 
>  
> At present, we will call Configuration#addDeprecation in the static block for 
> the deprecated configuration. But after testing, it is found that this does 
> not complete backward compatibility. When user upgrades HBase and does not 
> change the deprecated configuration to the new configuration, he will find 
> that the deprecated configuration does not effect, which may not be 
> consistent with expectations. The specific test results can be seen in the PR 
> above, and we can found the calling order of Configuration#addDeprecation is 
> very important.
>  
> Configuration#addDeprecation is a Hadoop API, looking through the Hadoop 
> source code, we will find that before creating the Configuration object, the 
> addDeprecatedKeys() method will be called first: 
> [https://github.com/apache/hadoop/blob/b93e448f9aa66689f1ce5059f6cdce8add130457/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/HdfsConfiguration.java#L34]
>  .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25928) TestHBaseConfiguration#testDeprecatedConfigurations is broken with Hadoop 3.3

2021-05-26 Thread Wei-Chiu Chuang (Jira)
Wei-Chiu Chuang created HBASE-25928:
---

 Summary: TestHBaseConfiguration#testDeprecatedConfigurations is 
broken with Hadoop 3.3
 Key: HBASE-25928
 URL: https://issues.apache.org/jira/browse/HBASE-25928
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0-alpha-1, 2.5.0
Reporter: Wei-Chiu Chuang


The test TestHBaseConfiguration#testDeprecatedConfigurations was added recently 
by HBASE-25861 to address the usage of Hadoop Configuration addDeprecations API.

However, the API's behavior was changed to fix a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Second release candidate for HBase 2.4.3 (RC1) is available

2021-05-26 Thread Pankaj Kumar
+1 (non-binding)

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_282): ok
 - mvn clean apache-rat:check -D hadoop.profile=3.0
* Built from source (1.8.0_282): ok
 - mvn clean install -D hadoop.profile=3.0 -DskipTests
* Unit tests pass (1.8.0_282): ok
 - mvn package -P runSmallTests -D hadoop.profile=3.0
-Dsurefire.rerunFailingTestsCount=3

* Installed a single node cluster and exercised basic operations with HBase
shell commands.

Regards,
Pankaj

On Wed, May 26, 2021 at 8:20 PM Viraj Jasani  wrote:

> +1
>
> * Signature: ok
> * Checksum : ok
> * Rat check (1.8.0_171): ok
>  - mvn clean apache-rat:check
> * Built from source (1.8.0_171): ok
>  - mvn clean install  -DskipTests
> * Nightly looks good
>
> Brought up 8 node cluster, added 15 B rows across several tables.
> No issues reported.
>
>
> On 2021/05/20 19:10:30, Andrew Purtell  wrote:
> > Please vote on this Apache HBase release candidate, hbase-2.4.3RC1.
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache HBase 2.4.3
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.4.3RC1:
> >
> > https://github.com/apache/hbase/tree/2.4.3RC1
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> > https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/
> >
> > These sources correspond with the git tag "2.4.3RC1" (401b60b217).
> >
> > Temporary Maven artifacts are available in the staging repository:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1447/
> >
> > Artifacts were signed with the apurt...@apache.org key which can be
> found
> > in:
> >
> > https://dist.apache.org/repos/dist/release/hbase/KEYS
> >
> > The API compatibility report for this RC can be found at:
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/api_compare_2.4.2_to_2.4.3RC1.html
> >
> > We performed the following successful pre-flight checks before
> > announcing the previous RC, RC0:
> >
> > - Unit tests
> >
> > - 10 TB Common Crawl data load via IntegrationTestLoadCommonCrawl,
> >   slowDeterministic policy
> >
> > To learn more about Apache HBase, please see
> >
> > http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
> >
>


Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

2021-05-26 Thread Josh Elser

Thanks Stack! (access given, as google probably told you already).

Please keep me honest.

On 5/26/21 12:29 PM, Stack wrote:

And, what is there currently is a nice write-up
S

On Wed, May 26, 2021 at 9:26 AM Stack  wrote:


Can I have comment access please Josh?
S

On Tue, May 25, 2021 at 8:24 PM Josh Elser  wrote:


Hi folks,

This is a follow-on for the HBASE-24749 discussion on storefile
tracking, specifically focusing on where/how do we store the list of
files for each Store.

I tried to capture my thoughts and the suggestions by Duo and Wellington
in this google doc [1].

Please feel free to ask for edit permission (and send me a note if your
email address isn't one that I would otherwise recognize :) ) to
correct, improve, or expand on any other sections.

FWIW, I was initially not super excited about a per-Store file, but, the
more I think about it, the more I'm coming around to that idea. I think
it will be more "exception-handling", but avoid the long-term
operational burden of yet-another-important-system-table.

- Josh

[1]

https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing







[jira] [Resolved] (HBASE-25907) Move StoreFlushContext out of HStore and make it pluggable

2021-05-26 Thread Wellington Chevreuil (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil resolved HBASE-25907.
--
Resolution: Fixed

> Move StoreFlushContext out of HStore and make it pluggable
> --
>
> Key: HBASE-25907
> URL: https://issues.apache.org/jira/browse/HBASE-25907
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 3.0.0-alpha-1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Major
>
> Currently, StoreFlushContext is directly implemented and instantiated inside 
> HStore class. This implementation assumes hfiles are always flushed into temp 
> dir first, and its commit implementation moves these files into the actual 
> family dir. In order to allow for the direct flushes (no temp, nor renames), 
> we need to make StoreFlushContext implementations pluggable in HStore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25927) Fix the log messages by not stringifying the exceptions in log

2021-05-26 Thread Sandeep Pal (Jira)
Sandeep Pal created HBASE-25927:
---

 Summary: Fix the log messages by not stringifying the exceptions 
in log
 Key: HBASE-25927
 URL: https://issues.apache.org/jira/browse/HBASE-25927
 Project: HBase
  Issue Type: Bug
Reporter: Sandeep Pal
Assignee: Sandeep Pal


There are few places where we stringify the exceptions and log, instead we 
should just pass them as a parameter to see the stack trace in good format. 

For example: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceWALReader.java#L175



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

2021-05-26 Thread Stack
And, what is there currently is a nice write-up
S

On Wed, May 26, 2021 at 9:26 AM Stack  wrote:

> Can I have comment access please Josh?
> S
>
> On Tue, May 25, 2021 at 8:24 PM Josh Elser  wrote:
>
>> Hi folks,
>>
>> This is a follow-on for the HBASE-24749 discussion on storefile
>> tracking, specifically focusing on where/how do we store the list of
>> files for each Store.
>>
>> I tried to capture my thoughts and the suggestions by Duo and Wellington
>> in this google doc [1].
>>
>> Please feel free to ask for edit permission (and send me a note if your
>> email address isn't one that I would otherwise recognize :) ) to
>> correct, improve, or expand on any other sections.
>>
>> FWIW, I was initially not super excited about a per-Store file, but, the
>> more I think about it, the more I'm coming around to that idea. I think
>> it will be more "exception-handling", but avoid the long-term
>> operational burden of yet-another-important-system-table.
>>
>> - Josh
>>
>> [1]
>>
>> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
>>
>


Re: [DISCUSS] Breakout discussion on storefile tracking storage solutions

2021-05-26 Thread Stack
Can I have comment access please Josh?
S

On Tue, May 25, 2021 at 8:24 PM Josh Elser  wrote:

> Hi folks,
>
> This is a follow-on for the HBASE-24749 discussion on storefile
> tracking, specifically focusing on where/how do we store the list of
> files for each Store.
>
> I tried to capture my thoughts and the suggestions by Duo and Wellington
> in this google doc [1].
>
> Please feel free to ask for edit permission (and send me a note if your
> email address isn't one that I would otherwise recognize :) ) to
> correct, improve, or expand on any other sections.
>
> FWIW, I was initially not super excited about a per-Store file, but, the
> more I think about it, the more I'm coming around to that idea. I think
> it will be more "exception-handling", but avoid the long-term
> operational burden of yet-another-important-system-table.
>
> - Josh
>
> [1]
>
> https://docs.google.com/document/d/1yzjvQvQfnT-M8ZgKdcQNedF8HssTnQR2loPkZtlJGVg/edit?usp=sharing
>


[jira] [Created] (HBASE-25926) Cleanup MetaTableAccessor references in FavoredNodeBalancer related code

2021-05-26 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-25926:
-

 Summary: Cleanup MetaTableAccessor references in 
FavoredNodeBalancer related code
 Key: HBASE-25926
 URL: https://issues.apache.org/jira/browse/HBASE-25926
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


Actually we do not need to use MetaTableAccessor here, and we do not need to 
put region info when updating favored node.

And also the tests need some improvements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25904) Client integration test is failing on master and branch-2

2021-05-26 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-25904.
--
Resolution: Fixed

> Client integration test is failing on master and branch-2
> -
>
> Key: HBASE-25904
> URL: https://issues.apache.org/jira/browse/HBASE-25904
> Project: HBase
>  Issue Type: Task
>  Components: integration tests
>Reporter: Duo Zhang
>Assignee: Nick Dimiduk
>Priority: Major
>
> {noformat}
> Starting up HBase
> 127.0.0.1: Host key verification failed.
> running master, logging to 
> /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/hbase-install/bin/../logs/hbase-jenkins-master-jenkins-hbase7.out
> 2021-05-22T04:16:40,142 INFO  [main] master.HMaster: STARTING service HMaster
> 2021-05-22T04:16:40,149 INFO  [main] util.VersionInfo: HBase 3.0.0-SNAPSHOT
> 2021-05-22T04:16:40,149 INFO  [main] util.VersionInfo: Source code repository 
> file:///home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/unpacked_src_tarball
>  revision=Unknown
> 2021-05-22T04:16:40,149 INFO  [main] util.VersionInfo: Compiled by jenkins on 
> Sat May 22 04:07:41 UTC 2021
> 2021-05-22T04:16:40,149 INFO  [main] util.VersionInfo: From source with 
> checksum 
> b6959885410c34f4458efd580213907ca50e84a0a900ea1d465d04a1a9480e520419d51da933547e3f517850a2913a8e8aefbcd6ba9e12589fced980910fb941
> cat: 
> /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/output-integration/hadoop-3/hbase-conf//regionservers:
>  No such file or directory
> cat: 
> /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/output-integration/hadoop-3/hbase-conf//regionservers:
>  No such file or directory
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
>   retry waiting for hbase to come up.
> /home/jenkins/jenkins-home/workspace/HBase_HBase_Nightly_master/component/dev-support/hbase_nightly_pseudo-distributed-test.sh:
>  line 539:  3122 Terminated  sleep "${sleep_time}"
> Shutting down HBase
> no hbase master found
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Second release candidate for HBase 2.4.3 (RC1) is available

2021-05-26 Thread Viraj Jasani
+1

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_171): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_171): ok
 - mvn clean install  -DskipTests
* Nightly looks good

Brought up 8 node cluster, added 15 B rows across several tables.
No issues reported.


On 2021/05/20 19:10:30, Andrew Purtell  wrote: 
> Please vote on this Apache HBase release candidate, hbase-2.4.3RC1.
> 
> The VOTE will remain open for at least 72 hours.
> 
> [ ] +1 Release this package as Apache HBase 2.4.3
> [ ] -1 Do not release this package because ...
> 
> The tag to be voted on is 2.4.3RC1:
> 
> https://github.com/apache/hbase/tree/2.4.3RC1
> 
> The release files, including signatures, digests, as well as CHANGES.md
> and RELEASENOTES.md included in this RC can be found at:
> 
> https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/
> 
> These sources correspond with the git tag "2.4.3RC1" (401b60b217).
> 
> Temporary Maven artifacts are available in the staging repository:
> 
> https://repository.apache.org/content/repositories/orgapachehbase-1447/
> 
> Artifacts were signed with the apurt...@apache.org key which can be found
> in:
> 
> https://dist.apache.org/repos/dist/release/hbase/KEYS
> 
> The API compatibility report for this RC can be found at:
> 
> 
> https://dist.apache.org/repos/dist/dev/hbase/2.4.3RC1/api_compare_2.4.2_to_2.4.3RC1.html
> 
> We performed the following successful pre-flight checks before
> announcing the previous RC, RC0:
> 
> - Unit tests
> 
> - 10 TB Common Crawl data load via IntegrationTestLoadCommonCrawl,
>   slowDeterministic policy
> 
> To learn more about Apache HBase, please see
> 
> http://hbase.apache.org/
> 
> Thanks,
> Your HBase Release Manager
> 


[jira] [Created] (HBASE-25925) FavoredNodeBalancer related code refactoring and improvement

2021-05-26 Thread Duo Zhang (Jira)
Duo Zhang created HBASE-25925:
-

 Summary: FavoredNodeBalancer related code refactoring and 
improvement
 Key: HBASE-25925
 URL: https://issues.apache.org/jira/browse/HBASE-25925
 Project: HBase
  Issue Type: Umbrella
Reporter: Duo Zhang


Will do some code refactoring first before actually moving it to hbase-balancer 
in HBASE-25649, as some of the improvements can also go to branch-2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25924) Seeing a spike in uncleanlyClosedWALs metric.

2021-05-26 Thread Rushabh Shah (Jira)
Rushabh Shah created HBASE-25924:


 Summary: Seeing a spike in uncleanlyClosedWALs metric.
 Key: HBASE-25924
 URL: https://issues.apache.org/jira/browse/HBASE-25924
 Project: HBase
  Issue Type: Bug
Reporter: Rushabh Shah
Assignee: Rushabh Shah


Getting the following log line in all of our production clusters when 
WALEntryStream is dequeuing WAL file.

{noformat}
 2021-05-02 04:01:30,437 DEBUG [04901996] regionserver.WALEntryStream - Reached 
the end of WAL file hdfs://. It was not closed cleanly, so we 
did not parse 8 bytes of data. This is normally ok.
{noformat}
The 8 bytes are usually the trailer size.

While dequeue'ing the WAL file from WALEntryStream, we reset the reader here.
[WALEntryStream|https://github.com/apache/hbase/blob/branch-1/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/WALEntryStream.java#L199-L221]

{code:java}
  private void tryAdvanceEntry() throws IOException {
if (checkReader()) {
  readNextEntryAndSetPosition();
  if (currentEntry == null) { // no more entries in this log file - see if 
log was rolled
if (logQueue.getQueue(walGroupId).size() > 1) { // log was rolled
  // Before dequeueing, we should always get one more attempt at 
reading.
  // This is in case more entries came in after we opened the reader,
  // and a new log was enqueued while we were reading. See HBASE-6758
  resetReader(); ---> HERE
  readNextEntryAndSetPosition();
  if (currentEntry == null) {
if (checkAllBytesParsed()) { // now we're certain we're done with 
this log file
  dequeueCurrentLog();
  if (openNextLog()) {
readNextEntryAndSetPosition();
  }
}
  }
} // no other logs, we've simply hit the end of the current open log. 
Do nothing
  }
}
// do nothing if we don't have a WAL Reader (e.g. if there's no logs in 
queue)
  }
{code}

In resetReader, we call the following methods, WALEntryStream#resetReader  
>  ProtobufLogReader#reset ---> ProtobufLogReader#initInternal.
In ProtobufLogReader#initInternal, we try to create the whole reader object 
from scratch to see if any new data has been written.
We reset all the fields of ProtobufLogReader except for ReaderBase#fileLength.
We calculate whether trailer is present or not depending on fileLength.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25923) Region state stuck in PENDING_OPEN

2021-05-26 Thread Xiaolin Ha (Jira)
Xiaolin Ha created HBASE-25923:
--

 Summary: Region state stuck in PENDING_OPEN
 Key: HBASE-25923
 URL: https://issues.apache.org/jira/browse/HBASE-25923
 Project: HBase
  Issue Type: Improvement
  Components: master, Region Assignment
Affects Versions: 1.0.0
Reporter: Xiaolin Ha
Assignee: Xiaolin Ha


Region will not be reassigned if encounters ConnectionClosingException, and 
then it will stuck in PENDING_OPEN state. Error logs are as follows,
{code:java}
INFO  
[jd-data-hbase02.gh.sankuai.com,16000,1621944138744-GeneralBulkAssigner-12] 
master.AssignmentManager: Unable to communicate with 
jd-data-hbase15.gh.sankuai.com,16020,1622026221268 in order to assign regions, 
org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to 
jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 failed on local exception: 
org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to 
jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 is closing. Call id=19239, 
waitTime=1
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.wrapException(AbstractRpcClient.java:289)
        at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1270)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
        at 
org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$BlockingStub.openRegion(AdminProtos.java:25890)
        at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:798)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1744)
        at 
org.apache.hadoop.hbase.master.GeneralBulkAssigner$SingleServerBulkAssigner.run(GeneralBulkAssigner.java:203)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: 
Connection to jd-data-hbase15.gh.sankuai.com/10.78.96.166:16020 is closing. 
Call id=19239, waitTime=1
        at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1083)
        at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:863)
        at 
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:580)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25898) RS getting aborted due to NPE in Replication WALEntryStream

2021-05-26 Thread Anoop Sam John (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-25898.

Hadoop Flags: Reviewed
  Resolution: Fixed

Pushed to master, branch-2, branch-2.4, branch-2.3, branch-1
Thanks for the reviews.

> RS getting aborted due to NPE in Replication WALEntryStream
> ---
>
> Key: HBASE-25898
> URL: https://issues.apache.org/jira/browse/HBASE-25898
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 1.7.0, 2.5.0, 2.3.6, 2.4.4
>
>
> Below sequence of events happened in a customer cluster
> An empty WAL file got roll req.
> The close of file failed at HDFS side but as there  file had all edits 
> synced, we continue.
> New WAL file is created and old rolled.
> This old WAL file got archived to oldWAL 
> {code}
> 2021-05-13 13:38:46.000   Riding over failed WAL close of 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678,
>  cause="Unexpected EOF while trying to read response from server", errors=1; 
> THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
> 2021-05-13 13:38:46.000   Rolled WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 
> with entries=0, filesize=90 B; new WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620913126549
> 2021-05-13 13:38:46.000Archiving 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  to hdfs://xxx/oldWALs/xxxt%2C16020%2C1620828102351.1620910673678
> 2021-05-13 13:38:46.000   Log 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  was moved to hdfs://xxx/oldWALs/xxx%2C16020%2C1620828102351.1620910673678
> {code}
> As there was move of file, the WALEntryStream got IOE and we will recreate 
> the stream .
> {code}
> ReplicationSourceWALReader#run
> while (isReaderRunning()) {
>   try {
> entryStream =
>   new WALEntryStream(logQueue, conf, currentPosition, 
> source.getWALFileLengthProvider(),
> source.getServerWALsBelongTo(), source.getSourceMetrics(), 
> walGroupId);
> while (isReaderRunning()) { 
> ...
> ...
> } catch (IOException e) { // stream related
> if (handleEofException(e, batch)) {
>   sleepMultiplier = 1;
> } else {
>   LOG.warn("Failed to read stream of replication entries", e);
>   if (sleepMultiplier < maxRetriesMultiplier) {
> sleepMultiplier++;
>   }
>   Threads.sleep(sleepForRetries * sleepMultiplier);
> }
> }
> {code}
> eofAutoRecovery is turned off anyways.  So it will go to outer while loop and 
> create new WALEntryStream object
> Then we do readWALEntries
> {code}
> protected WALEntryBatch readWALEntries(WALEntryStream entryStream,
>   WALEntryBatch batch) throws IOException, InterruptedException {
> Path currentPath = entryStream.getCurrentPath();
> if (!entryStream.hasNext()) {
> {code}
> Here the currentPath will be still null. 
> WALEntryStream#hasNext -> tryAdvanceEntry -> checkReader -> openNextLog
> {code}
> private boolean openNextLog() throws IOException {
> PriorityBlockingQueue queue = logQueue.getQueue(walGroupId);
> Path nextPath = queue.peek();
> if (nextPath != null) {
>   openReader(nextPath);
> 
> private void openReader(Path path) throws IOException {
> try {
>   // Detect if this is a new file, if so get a new reader else
>   // reset the current reader so that we see the new data
>   if (reader == null || !getCurrentPath().equals(path)) {
> closeReader();
> reader = WALFactory.createReader(fs, path, conf);
> seek();
> setCurrentPath(path);
>   } else {
> resetReader();
>   }
> } catch (FileNotFoundException fnfe) {
>   handleFileNotFound(path, fnfe);
> }  catch (RemoteException re) {
>   IOException ioe = re.unwrapRemoteException(FileNotFoundException.class);
>   if (!(ioe instanceof FileNotFoundException)) {
> throw ioe;
>   }
>   handleFileNotFound(path, (FileNotFoundException)ioe);
> } catch (LeaseNotRecoveredException lnre) {
>   // HBASE-15019 the WAL was not closed due to some hiccup.
>   LOG.warn("Try to recover the WAL lease " + currentPath, lnre);
>   recoverLease(conf, currentPath);
>   reader = null;
> } catch (NullPointerException npe) {
>   // Workaround for race condition in HDFS-4380
>   // which throws a NPE if we open a file before any data node has the 
> most recent block
>   // Just sleep a

[jira] [Created] (HBASE-25922) Disabled sanity checks ignored on snapshot restore

2021-05-26 Thread Julian Nodorp (Jira)
Julian Nodorp created HBASE-25922:
-

 Summary: Disabled sanity checks ignored on snapshot restore
 Key: HBASE-25922
 URL: https://issues.apache.org/jira/browse/HBASE-25922
 Project: HBase
  Issue Type: Bug
  Components: conf, snapshots
Affects Versions: 2.4.2, 2.2.6
 Environment: This has been tested in
 * Google Dataproc running HBase 2.2.6
 * Local HBase 2.4.2
Reporter: Julian Nodorp


Disabling sanity checks on a table is ignored when restoring snapshots. If this 
is expected behavior at least the error message is misleading.
h3. Steps *to Reproduce*
 # Create a new table
{{create 't', 'cf'}}
 # Add a coprocessor to the newly created table
{{alter 't', METHOD => 'table_att', 'coprocessor' => 
'coprocessor.jar|com.example.MyCoprocessor|0'}}
 # Create a snapshot
{{snapshot 't', 'snapshot-t'}}
 # Disable the table to prevent region servers from crashing in the next step
{{disable 't'}}
 # Delete the coprocessor JAR and restart HBase.
 # Attempt to restore the snapshot leads to failing sanity check as expected
{{restore_snapshot 'snapshot-t'}}
{{ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: coprocessor.jar Set 
hbase.table.sanity.checks to false at conf or table descriptor if you want to 
bypass sanity checks [...]}}
 # Disable sanity checks (as described in the error message) and retry
{{alter 't', CONFIGURATION => \{'hbase.table.sanity.checks' => 'false'}}}
{{restore_snapshot 'snapshot-t'}}

h3. Expected Behavior

The snapshot is restored.
h3. Actual Behavior

The same error message as in step 6. is shown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)