[jira] [Resolved] (HBASE-25482) Improve SimpleRegionNormalizer#getAverageRegionSizeMb

2021-04-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-25482.
--
Resolution: Fixed

> Improve SimpleRegionNormalizer#getAverageRegionSizeMb
> -
>
> Key: HBASE-25482
> URL: https://issues.apache.org/jira/browse/HBASE-25482
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.3.6, 2.4.2
>
>
> If the table is set NormalizerTargetRegionSize, we take 
> NormalizerTargetRegionSize as avgRegionSize and return it. So the totalSizeMb 
> of table is not always calculated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25482) Improve SimpleRegionNormalizer#getAverageRegionSizeMb

2021-04-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-25482:
--

> Improve SimpleRegionNormalizer#getAverageRegionSizeMb
> -
>
> Key: HBASE-25482
> URL: https://issues.apache.org/jira/browse/HBASE-25482
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.2
>
>
> If the table is set NormalizerTargetRegionSize, we take 
> NormalizerTargetRegionSize as avgRegionSize and return it. So the totalSizeMb 
> of table is not always calculated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25653) Add units and round off region size to 2 digits after decimal

2021-04-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-25653:
--

Reopen for backports.

> Add units and round off region size to 2 digits after decimal
> -
>
> Key: HBASE-25653
> URL: https://issues.apache.org/jira/browse/HBASE-25653
> Project: HBase
>  Issue Type: Improvement
>  Components: master, Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.3.0, 2.4.0
>Reporter: Nick Dimiduk
>Assignee: Divyesh Chandra
>Priority: Major
> Fix For: 3.0.0-alpha-1
>
>
> Normalizer logs progress and includes details regarding region count and 
> target size. The size values are logged without units, and floating point 
> numbers are logged without a specified precision, which can result in 
> scientific notation. Clean up the formatting of these log messages to make 
> them easily readable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25482) Improve SimpleRegionNormalizer#getAverageRegionSizeMb

2021-04-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk resolved HBASE-25482.
--
Resolution: Fixed

Sorry, wrong issue.

> Improve SimpleRegionNormalizer#getAverageRegionSizeMb
> -
>
> Key: HBASE-25482
> URL: https://issues.apache.org/jira/browse/HBASE-25482
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.2
>
>
> If the table is set NormalizerTargetRegionSize, we take 
> NormalizerTargetRegionSize as avgRegionSize and return it. So the totalSizeMb 
> of table is not always calculated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HBASE-25482) Improve SimpleRegionNormalizer#getAverageRegionSizeMb

2021-04-12 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk reopened HBASE-25482:
--

Reopening for backports.

> Improve SimpleRegionNormalizer#getAverageRegionSizeMb
> -
>
> Key: HBASE-25482
> URL: https://issues.apache.org/jira/browse/HBASE-25482
> Project: HBase
>  Issue Type: Improvement
>  Components: Normalizer
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: Baiqiang Zhao
>Assignee: Baiqiang Zhao
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.2
>
>
> If the table is set NormalizerTargetRegionSize, we take 
> NormalizerTargetRegionSize as avgRegionSize and return it. So the totalSizeMb 
> of table is not always calculated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25769) Update default weight of cost functions

2021-04-12 Thread Clara Xiong (Jira)
Clara Xiong created HBASE-25769:
---

 Summary: Update default weight of cost functions
 Key: HBASE-25769
 URL: https://issues.apache.org/jira/browse/HBASE-25769
 Project: HBase
  Issue Type: Sub-task
  Components: Balancer
Reporter: Clara Xiong


In production, we have seen some critical big tables that handle majority of 
the load. Table Skew is becoming more important. With the update of table skew 
function, balancer finally works for large table distribution on large cluster. 
We should increase the weight from 35 to a level comparable to region count 
skew: 500. We can even push further to replace region count skew by table skew 
since the latter works in the same way and account for region distribution per 
node.

Another weight we found helpful to increase is for store file size cost 
function. Ideally if normalizer works perfectly, we don't need to worry about 
it since region count skew would have accounted for it. But we are often in a 
situation it doesn't. Store file distribution needs to be given more way as 
accommodation. we tested changing it from 5 to 200 and it works fine.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Multi-dimensional Range Queries Help

2021-04-12 Thread Kevin Wright
Hi!

Our application requires fast read queries that specify two ranges. One
range on timestamps, and another on ids. We are currently using Apache
HBase as our db, but we’re unsure how to optimally design the row keys /
schemas. Currently, scanning over row key (the ids) with filter on
timeranges is taking more time than what we expect. A normal query would
probably have say 200 rows that match the id range, and about 10 rows that
match both ranges, and we have currently on the order of 10s of millions of
rows.

We’re wondering if there’s something we can do to increase throughput with
HBase (e.g., is there something like composite indexing like in MySQL?).
Not sure if this is the best place to ask this, but if anyone could point
us to the right direction, that would be great!

Thank you!


Multi-dimensional Range Queries Help

2021-04-12 Thread Kevin Wright
Hi!

Our application requires fast read queries that specify two ranges. One
range on timestamps, and another on ids. We are currently using Apache
HBase as our db, but we’re unsure how to optimally design the row keys /
schemas. Currently, scanning over row key (the ids) with filter on
timeranges is taking more time than what we expect. A normal query would
probably have say 200 rows that match the id range, and about 10 rows that
match both ranges, and we have currently on the order of 10s of millions of
rows.

We’re wondering if there’s something we can do to increase throughput with
HBase (e.g., is there something like composite indexing like in MySQL?).
Not sure if this is the best place to ask this, but if anyone could point
us to the right direction, that would be great!

Thank you!


[jira] [Resolved] (HBASE-25759) The master services field in LocalityBasedCostFunction is never used

2021-04-12 Thread Duo Zhang (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang resolved HBASE-25759.
---
Fix Version/s: 2.3.6
   2.4.3
   2.5.0
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.3+.

Thanks [~niuyulin] for reviewing.

> The master services field in LocalityBasedCostFunction is never used
> 
>
> Key: HBASE-25759
> URL: https://issues.apache.org/jira/browse/HBASE-25759
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3, 2.3.6
>
>
> We just use it to test whether we should do the calculation.
> But in fact, it should never be null when we arrive here. Maybe it is because 
> in test we may skip setting the master services but I think we should just 
> skip balancing in the balancer, not in the cost function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


how to join slack channel

2021-04-12 Thread  
Hi, adminI'm a bigdata developer and very interested in hbase, so I want to 
join hbase slack channel, but i don't have an email address like xx.apache.org 
or xx.google.comhow could  i join the channel, please show me a method to join
Thx



Re: [ANNOUNCE] New HBase committer Geoffrey Jacoby

2021-04-12 Thread Toshihiro Suzuki
Congratulations Geoffrey!

- Toshi

> On Apr 11, 2021, at 0:12, Ankit Singhal  wrote:
> 
> Congratulations and welcome Geoffrey!!
> 
> On Sat, Apr 10, 2021 at 5:48 PM Jan Hentschel <
> jan.hentsc...@ultratendency.com> wrote:
> 
>> Congratulations and welcome!
>> 
>> From: Viraj Jasani 
>> Reply-To: "dev@hbase.apache.org" 
>> Date: Friday, April 9, 2021 at 1:24 PM
>> To: hbase-dev , hbase-user 
>> Subject: [ANNOUNCE] New HBase committer Geoffrey Jacoby
>> 
>> On behalf of the Apache HBase PMC I am pleased to announce that Geoffrey
>> Jacoby has accepted the PMC's invitation to become a committer on the
>> project.
>> 
>> Thanks so much for the work you've been contributing. We look forward to
>> your continued involvement.
>> 
>> Congratulations and welcome, Geoffrey!
>> 
>> 



Re: [VOTE] The first HBase 2.2.7 release candidate (RC0) is available

2021-04-12 Thread Peter Somogyi
+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_282): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_282): ok
 - mvn clean install  -DskipTests
* Unit tests pass (1.8.0_282): ok
 - mvn package -P runAllTests  -Dsurefire.rerunFailingTestsCount=3

On Sun, Apr 11, 2021 at 6:05 PM Jan Hentschel <
jan.hentsc...@ultratendency.com> wrote:

> +1 (binding)
>
> * Signature: ok
> * Checksum : ok
> * Rat check (1.8.0_202-ea): ok
>  - mvn clean apache-rat:check
> * Built from source (1.8.0_202-ea): ok
>  - mvn clean install  -DskipTests
> * Unit tests pass (1.8.0_202-ea): ok
>  - mvn package -P runSmallTests
> -Dsurefire.rerunFailingTestsCount=3
>
> Verified the compatibility report, as well as the changes and release
> notes.
>
> From: Guanghao Zhang 
> Reply-To: "u...@hbase.apache.org" 
> Date: Sunday, April 11, 2021 at 2:34 PM
> To: HBase Dev List , Hbase-User <
> u...@hbase.apache.org>
> Subject: [VOTE] The first HBase 2.2.7 release candidate (RC0) is available
>
> [CAUTION] The sender of this email is outside our organization. Please DO
> NOT CLICK links, download attachments or respond unless you recognize the
> sender and know the content is safe. Please contact IT immediately in case
> you find it suspicious.
>
> Please vote on this release candidate (RC) for Apache HBase 2.2.7.
> Meanwhile, as branch-2.2 will be EOL, please don't push new commits to it.
> And this will be the last one of the 2.2.x releases. Thanks.
>
> The VOTE will remain open for at least 72 hours.
>
> [ ] +1 Release this package as Apache HBase 2.2.7
> [ ] -1 Do not release this package because ...
>
> The tag to be voted on is 2.2.7RC0. The release files, including
> signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/hbase/2.2.7RC0/
>
> Maven artifacts are available in a staging repository at:
> https://repository.apache.org/content/repositories/orgapachehbase-1440/
>
> Signatures used for HBase RCs can be found in this file:
> https://dist.apache.org/repos/dist/release/hbase/KEYS
>
> The list of bug fixes going into 2.2.7 can be found in included
> CHANGES.md and RELEASENOTES.md available here:
> https://dist.apache.org/repos/dist/dev/hbase/2.2.7RC0/CHANGES.md
> https://dist.apache.org/repos/dist/dev/hbase/2.2.7RC0/RELEASENOTES.md
>
> A detailed source and binary compatibility report for this release is
> available at:
>
> https://dist.apache.org/repos/dist/dev/hbase/2.2.7RC0/api_compare_2.2.7RC0_to_2.2.6.html
>
> To learn more about Apache HBase, please see http://hbase.apache.org/
>
> Thanks,
> Guanghao Zhang
>
>


[jira] [Reopened] (HBASE-25748) [Flake Test][branch-1] TestAdmin2

2021-04-12 Thread Reid Chan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reid Chan reopened HBASE-25748:
---

Spotted a bug in testCreateTableRPCTimeOut, let me try to fix it.

> [Flake Test][branch-1] TestAdmin2
> -
>
> Key: HBASE-25748
> URL: https://issues.apache.org/jira/browse/HBASE-25748
> Project: HBase
>  Issue Type: Test
>  Components: test
>Reporter: Reid Chan
>Assignee: Reid Chan
>Priority: Major
> Fix For: 1.7.0
>
>
> Attempt to improve the stability.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25768) Support a overall simple and fast balance strategy for StochasticLoadBalancer

2021-04-12 Thread Xiaolin Ha (Jira)
Xiaolin Ha created HBASE-25768:
--

 Summary: Support a overall simple and fast balance strategy for 
StochasticLoadBalancer
 Key: HBASE-25768
 URL: https://issues.apache.org/jira/browse/HBASE-25768
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 1.4.13, 2.0.0, 3.0.0-alpha-1
Reporter: Xiaolin Ha
Assignee: Xiaolin Ha


When we use StochasticLoadBalancer + balanceByTable, we could face two 
difficulties.
 # For each table, their regions are distributed uniformly, but for the overall 
cluster, still exiting imbalance between RSes;
 # When there are large-scaled restart of RSes, or expansion for groups or 
cluster, we hope the balancer can execute as soon as possible, but the 
StochasticLoadBalancer may need a lot of time to compute costs.

We can detect these circumstances in StochasticLoadBalancer, and before the 
normal balance steps trying, we can add a strategy to let it just balance like 
the SimpleLoadBalancer or use few light cost functions here.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)