[jira] [Updated] (HBASE-23860) HBase Thrift bindings generating broken code

2020-02-17 Thread Hari Sekhon (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-23860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-23860:

Description: 
Generated perl thrift bindings are broken:
{code:java}
$ thrift --gen perl 
src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

$ cd gen-perl

$ perl -I . -Tc Hbase/Hbase.pm
Subroutine new redefined at /Library/Perl/5.18/Thrift/Exception.pm line 38.
Bareword "Thrift::TMessageType::CALL" not allowed while "strict subs" in use at 
Hbase/Hbase.pm line 9897.
BEGIN not safe after errors--compilation aborted at Hbase/Hbase.pm line 12163.
{code}
 

Tested from build from source of both HBase 2.1.2 and HBase 2.2.3 using both 
Thrift 0.12 and 0.13.

 

  was:
Generated perl thrift bindings are broken:
{code:java}
$ thrift --gen perl 
src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

$ cd gen-perl

$ perl -I . -Tc Hbase/Hbase.pm
Subroutine new redefined at /Library/Perl/5.18/Thrift/Exception.pm line 38.
Bareword "Thrift::TMessageType::CALL" not allowed while "strict subs" in use at 
Hbase/Hbase.pm line 9897.
BEGIN not safe after errors--compilation aborted at Hbase/Hbase.pm line 12163.
{code}
 

Tested from build from source of both HBase 2.1.2 and 2.2.3 using both Thrift 
0.12 and 0.13.

 


> HBase Thrift bindings generating broken code
> 
>
> Key: HBASE-23860
> URL: https://issues.apache.org/jira/browse/HBASE-23860
> Project: HBase
>  Issue Type: Bug
>  Components: Thrift
>Affects Versions: 2.2.3
>Reporter: Hari Sekhon
>Priority: Major
>
> Generated perl thrift bindings are broken:
> {code:java}
> $ thrift --gen perl 
> src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
> $ cd gen-perl
> $ perl -I . -Tc Hbase/Hbase.pm
> Subroutine new redefined at /Library/Perl/5.18/Thrift/Exception.pm line 38.
> Bareword "Thrift::TMessageType::CALL" not allowed while "strict subs" in use 
> at Hbase/Hbase.pm line 9897.
> BEGIN not safe after errors--compilation aborted at Hbase/Hbase.pm line 12163.
> {code}
>  
> Tested from build from source of both HBase 2.1.2 and HBase 2.2.3 using both 
> Thrift 0.12 and 0.13.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-23860) HBase Thrift bindings generating broken code

2020-02-17 Thread Hari Sekhon (Jira)
Hari Sekhon created HBASE-23860:
---

 Summary: HBase Thrift bindings generating broken code
 Key: HBASE-23860
 URL: https://issues.apache.org/jira/browse/HBASE-23860
 Project: HBase
  Issue Type: Bug
  Components: Thrift
Affects Versions: 2.2.3
Reporter: Hari Sekhon


Generated perl thrift bindings are broken:
{code:java}
$ thrift --gen perl 
src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

$ cd gen-perl

$ perl -I . -Tc Hbase/Hbase.pm
Subroutine new redefined at /Library/Perl/5.18/Thrift/Exception.pm line 38.
Bareword "Thrift::TMessageType::CALL" not allowed while "strict subs" in use at 
Hbase/Hbase.pm line 9897.
BEGIN not safe after errors--compilation aborted at Hbase/Hbase.pm line 12163.
{code}
 

Tested from build from source of both HBase 2.1.2 and 2.2.3 using both Thrift 
0.12 and 0.13.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-21006) Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-20 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon resolved HBASE-21006.
-
Resolution: Duplicate

> Balancer - data locality drops 30-40% across all nodes after every 
> cluster-wide rolling restart, not migrating regions back to original 
> RegionServers?
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-17 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583615#comment-16583615
 ] 

Hari Sekhon commented on HBASE-21014:
-

I'm inclined to leave this open rather than close as a duplicate because this 
is a really important improvement and people are more likely to find one of 
these two tickets if both are left open

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-17 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583614#comment-16583614
 ] 

Hari Sekhon commented on HBASE-21014:
-

Yes it looks like the FavoredStochasticLoadBalancer was supposed to become a 
thing but didn't get finished and there hasn't been any movement on it in 
nearly 2 years :(

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-10 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576290#comment-16576290
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/10/18 1:36 PM:
--

I thought that was really the crux of it:

Write the HDFS location preference hints the same as the FavoredNodeBalancer 
does while applying all the usual Stochastic Balancer balancing heuristics to 
make sure regions and load are evenly spread. Since the HBase Balancer chooses 
where to move regions to it can update the block location preferences metadata 
to match it whenever it migrates regions.

That way when you need to rebalance HDFS blocks, the HDFS Balancer won't move 
the region blocks away from the RegionServers where the regions are being 
served out of and therefore preserve HBase data locality.


was (Author: harisekhon):
I thought that was really the crux of it:

Write the HDFS location preference hints the same as the FavoredNodeBalancer 
does while applying all the usual Stochastic Balancer balancing heuristics to 
make sure regions and load are evenly spread. Since the HBase Balancer chooses 
where to move regions to it can update the block location preferences metadata 
to match it whenever it migrates regions.

That way if I need to rebalance HDFS blocks, the HDFS Balancer won't move the 
region blocks out of their primary active region locations when hdfs block 
pinning is enabled.

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-10 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576290#comment-16576290
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/10/18 1:33 PM:
--

I thought that was really the crux of it:

Write the HDFS location preference hints the same as the FavoredNodeBalancer 
does while applying all the usual Stochastic Balancer balancing heuristics to 
make sure regions and load are evenly spread. Since the HBase Balancer chooses 
where to move regions to it can update the block location preferences metadata 
to match it whenever it migrates regions.

That way if I need to rebalance HDFS blocks, the HDFS Balancer won't move the 
region blocks out of their primary active region locations when hdfs block 
pinning is enabled.


was (Author: harisekhon):
I thought this was really the crux of it:

Write the HDFS location preference hints the same as the FavoredNodeBalancer 
does while applying all the usual Stochastic Balancer balancing heuristics to 
make sure regions and load are evenly spread.

That way if I need to rebalance HDFS blocks, the HDFS Balancer won't move the 
region blocks out of their primary active region locations when hdfs block 
pinning is enabled.

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-10 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576290#comment-16576290
 ] 

Hari Sekhon commented on HBASE-21014:
-

I thought this was really the crux of it:

Write the HDFS location preference hints the same as the FavoredNodeBalancer 
does while applying all the usual Stochastic Balancer balancing heuristics to 
make sure regions and load are evenly spread.

That way if I need to rebalance HDFS blocks, the HDFS Balancer won't move the 
region blocks out of their primary active region locations when hdfs block 
pinning is enabled.

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-09 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574532#comment-16574532
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/9/18 9:10 AM:
-

Yes this is what I thought hence I'd already linked HBASE-7932 and book 
reference as well as a discussion on the mailing list from some of my 
ex-colleagues from Cloudera who really know their stuff like Harsh J and Lars 
George.

So really what I'm asking for is for the Stochastic Balancer to include the 
hint writes like the FavoredNodeBalancer.

I already have dfs.datanode.block-pinning.enabled = true, it's just not much 
use until the Stochastic Balancer gets this support as I don't want to lose the 
better balancing which is used more often than an hdfs rebalance.


was (Author: harisekhon):
Yes this is what I thought hence I'd already linked HBase-7932 and book 
reference as well as a discussion on the mailing list from some of my 
ex-colleagues from Cloudera who really know their stuff like Harsh J and Lars 
George.

So really what I'm asking for is for the Stochastic Balancer to include the 
hint writes like the FavoredNodeBalancer.

I already have dfs.datanode.block-pinning.enabled = true, it's just not much 
use until the Stochastic Balancer gets this support as I don't want to lose the 
better balancing which is used more often than an hdfs rebalance.

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-09 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574532#comment-16574532
 ] 

Hari Sekhon commented on HBASE-21014:
-

Yes this is what I thought hence I'd already linked HBase-7932 and book 
reference as well as a discussion on the mailing list from some of my 
ex-colleagues from Cloudera who really know their stuff like Harsh J and Lars 
George.

So really what I'm asking for is for the Stochastic Balancer to include the 
hint writes like the FavoredNodeBalancer.

I already have dfs.datanode.block-pinning.enabled = true, it's just not much 
use until the Stochastic Balancer gets this support as I don't want to lose the 
better balancing which is used more often than an hdfs rebalance.

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/8/18 1:01 PM:
-

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would cause space exhaustion resulting in an outage as the 
prior blocks are not removed (this very nearly happened the first time I ran 
major compaction on this cluster but I realised what was going on and took 
quick action to avoid an outage - annoyingly there is not yet a major 
compaction cancel command in this version of HBase so it couldn't just be 
stopped once started and ran for several hours)


was (Author: harisekhon):
I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage as the prior 
blocks are not removed (this very nearly happened the first time I ran major 
compaction on this cluster but I realised what was going on and took quick 
action to avoid an outage - annoyingly there is not yet a major compaction 
cancel command in this version of HBase so it couldn't just be stopped once 
started and ran for several hours)

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/8/18 12:46 PM:
--

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage as the prior 
blocks are not removed (this very nearly happened the first time I ran major 
compaction on this cluster but I realised what was going on and took quick 
action to avoid an outage - annoyingly there is not yet a major compaction 
cancel command in this version of HBase so it couldn't just be stopped once 
started and ran for several hours)


was (Author: harisekhon):
I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage (this very 
nearly happened the first time I ran major compaction on this cluster but I 
realised what was going on and took quick action to avoid an outage - 
annoyingly there is not yet a major compaction cancel command in this version 
of HBase so it couldn't just be stopped once started and ran for several hours)

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/8/18 12:45 PM:
--

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage (this very 
nearly happened the first time I ran major compaction on this cluster but I 
realised what was going on and took quick action to avoid an outage - 
annoyingly there is not yet a major compaction cancel command in this version 
of HBase so it couldn't just be stopped once started and ran for several hours)


was (Author: harisekhon):
I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage (this very 
nearly happened the first time I ran major compaction on this cluster but I 
realised what was going on and took quick action to avoid an outage - 
annoyingly there is not yet a major compaction cancel command in this version 
of HBase)

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/8/18 12:44 PM:
--

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

Normally I would just balance and then major compact but there are 2 issues 
with running major compaction:
 # performance impact - this cluster is production and heavily loaded
 # this cluster is already running around 70-80% full which combined with HBase 
rolling snapshots covering 4 days means that more than the one scheduled major 
compaction a week would space exhaustion resulting in an outage (this very 
nearly happened the first time I ran major compaction on this cluster but I 
realised what was going on and took quick action to avoid an outage - 
annoyingly there is not yet a major compaction cancel command in this version 
of HBase)


was (Author: harisekhon):
I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon edited comment on HBASE-21014 at 8/8/18 12:37 PM:
--

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer can read the preferred location hints 
and not move those blocks, therefore not losing data locality?


was (Author: harisekhon):
I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer does not move them and lose data 
locality?

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-08 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573153#comment-16573153
 ] 

Hari Sekhon commented on HBASE-21014:
-

I thought that was the whole point of the FavoredNodeLoadBalancer  - so that 
HBase Balancer can write those HDFS hints based on the knowledge it has of 
region locations so that HDFS Balancer does not move them and lose data 
locality?

> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-07 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21014:

Description: 
Improve Stochastic Balancer to include the HDFS region location hints to avoid 
HDFS Balancer destroying data locality.

Right now according to a mix of docs, jiras and mailing list info it appears 
that one must change
{code:java}
hbase.master.loadbalancer.class{code}
to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks like 
this functionality is only within FavoredNodeBalancer and not the standard 
Stochastic Balancer.

[http://hbase.apache.org/book.html#_hbase_and_hdfs]

This is not ideal because we'd still like to use all the heuristics and work 
that has gone in the Stochastic Balancer which I believe right now is the best 
and most mature HBase balancer.

See also the linked Jiras and this discussion:

[http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]

  was:
Improve Stochastic Balancer to include the HDFS region location hints to avoid 
HDFS Balancer destroying data locality.

Right now according to a mix of docs, jiras and mailing list info it appears 
that one must change
{code:java}
hbase.master.loadbalancer.class{code}
to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks like 
this functionality is only within FavouredNodeBalancer and not the standard 
Stochastic Balancer.

[http://hbase.apache.org/book.html#_hbase_and_hdfs]

This is not ideal because we'd still like to use all the heuristics and work 
that has gone in the Stochastic Balancer which I believe right now is the best 
and most mature HBase balancer.

See also the linked Jiras and this discussion:

[http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]


> Improve Stochastic Balancer to write HDFS favoured node hints for region 
> primary blocks to avoid destroying data locality if needing to use HDFS 
> Balancer
> -
>
> Key: HBASE-21014
> URL: https://issues.apache.org/jira/browse/HBASE-21014
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Improve Stochastic Balancer to include the HDFS region location hints to 
> avoid HDFS Balancer destroying data locality.
> Right now according to a mix of docs, jiras and mailing list info it appears 
> that one must change
> {code:java}
> hbase.master.loadbalancer.class{code}
> to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks 
> like this functionality is only within FavoredNodeBalancer and not the 
> standard Stochastic Balancer.
> [http://hbase.apache.org/book.html#_hbase_and_hdfs]
> This is not ideal because we'd still like to use all the heuristics and work 
> that has gone in the Stochastic Balancer which I believe right now is the 
> best and most mature HBase balancer.
> See also the linked Jiras and this discussion:
> [http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21014) Improve Stochastic Balancer to write HDFS favoured node hints for region primary blocks to avoid destroying data locality if needing to use HDFS Balancer

2018-08-06 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-21014:
---

 Summary: Improve Stochastic Balancer to write HDFS favoured node 
hints for region primary blocks to avoid destroying data locality if needing to 
use HDFS Balancer
 Key: HBASE-21014
 URL: https://issues.apache.org/jira/browse/HBASE-21014
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 1.1.2
Reporter: Hari Sekhon


Improve Stochastic Balancer to include the HDFS region location hints to avoid 
HDFS Balancer destroying data locality.

Right now according to a mix of docs, jiras and mailing list info it appears 
that one must change
{code:java}
hbase.master.loadbalancer.class{code}
to the org.apache.hadoop.hbase.favored.FavoredNodeLoadBalancer as it looks like 
this functionality is only within FavouredNodeBalancer and not the standard 
Stochastic Balancer.

[http://hbase.apache.org/book.html#_hbase_and_hdfs]

This is not ideal because we'd still like to use all the heuristics and work 
that has gone in the Stochastic Balancer which I believe right now is the best 
and most mature HBase balancer.

See also the linked Jiras and this discussion:

[http://apache-hbase.679495.n3.nabble.com/HDFS-Balancer-td4086607.html]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-06 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Issue Type: Bug  (was: Improvement)

> Balancer - data locality drops 30-40% across all nodes after every 
> cluster-wide rolling restart, not migrating regions back to original 
> RegionServers?
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Bug
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21006) Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568720#comment-16568720
 ] 

Hari Sekhon commented on HBASE-21006:
-

I will do follow up analysis of this on Monday as it's late here in London.

I saw HBASE-18164 but I didn't think it quite fitted, and I didn't see 
HBASE-18036. If it's covered by that one then I'll close this as a duplicate 
after we review it next week.

> Balancer - data locality drops 30-40% across all nodes after every 
> cluster-wide rolling restart, not migrating regions back to original 
> RegionServers?
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21006) Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568697#comment-16568697
 ] 

Hari Sekhon commented on HBASE-21006:
-

I have, and pointed them to this Jira - that way Google can do its job in case 
anyone else is wondering why their data locality is destroyed after every 
rolling restart.

> Balancer - data locality drops 30-40% across all nodes after every 
> cluster-wide rolling restart, not migrating regions back to original 
> RegionServers?
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% across all nodes after every cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% across all nodes after every 
cluster-wide rolling restart, not migrating regions back to original 
RegionServers?  (was: Balancer - data locality drops 30-40% after each 
cluster-wide rolling restart, not migrating regions back to original 
RegionServers?)

> Balancer - data locality drops 30-40% across all nodes after every 
> cluster-wide rolling restart, not migrating regions back to original 
> RegionServers?
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster-wide rolling restart, not migrating regions back to original RegionServers?

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% after each cluster-wide 
rolling restart, not migrating regions back to original RegionServers?  (was: 
Balancer - data locality drops 30-40% after each cluster-wide rolling restart, 
not migrating regions back to original regionservers?)

> Balancer - data locality drops 30-40% after each cluster-wide rolling 
> restart, not migrating regions back to original RegionServers?
> 
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster-wide rolling restart, not migrating regions back to original regionservers?

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Description: 
After doing rolling restarts of my HBase cluster the data locality drops by 
30-40% every time which implies the stochastic balancer is not optimizing for 
data locality enough, at least not under the circumstance of rolling restarts, 
and that it must not be balancing the regions back to their original 
RegionServers.

The stochastic balancer is supposed to take data locality in to account but if 
this is the case, surely it should move regions back to their original 
RegionServers and data locality should return back to around where it was, not 
drop by 30-40% percent every time I need to do some tuning and a rolling 
restarts.

  was:
After doing rolling restarts of my HBase cluster the data locality drops by 
30-40% every time which implies the stochastic balancer is not optimizing for 
data locality enough, at least not under the circumstance of rolling restarts.

The stochastic balancer is supposed to take data locality in to account but if 
this is the case, surely it should move regions back to their original 
RegionServer and data locality should return back to around where it was, not 
drop by 30-40% percent every time I need to do some tuning and a rolling 
restarts.


> Balancer - data locality drops 30-40% after each cluster-wide rolling 
> restart, not migrating regions back to original regionservers?
> 
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster-wide rolling restart, not migrating regions back to original regionservers?

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% after each cluster-wide 
rolling restart, not migrating regions back to original regionservers?  (was: 
Balancer - data locality drops 30-40% after each cluster-wide rolling restart, 
not migrating regions back to original regionservers)

> Balancer - data locality drops 30-40% after each cluster-wide rolling 
> restart, not migrating regions back to original regionservers?
> 
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster-wide rolling restart, not migrating regions back to original regionservers?

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Description: 
After doing rolling restarts of my HBase cluster the data locality drops by 
30-40% every time which implies the stochastic balancer is not optimizing for 
data locality enough, at least not under the circumstance of rolling restarts, 
and that it must not be balancing the regions back to their original 
RegionServers.

The stochastic balancer is supposed to take data locality in to account but if 
this is the case, surely it should move regions back to their original 
RegionServers and data locality should return back to around where it was, not 
drop by 30-40% percent every time I need to do some tuning and a rolling 
restart.

  was:
After doing rolling restarts of my HBase cluster the data locality drops by 
30-40% every time which implies the stochastic balancer is not optimizing for 
data locality enough, at least not under the circumstance of rolling restarts, 
and that it must not be balancing the regions back to their original 
RegionServers.

The stochastic balancer is supposed to take data locality in to account but if 
this is the case, surely it should move regions back to their original 
RegionServers and data locality should return back to around where it was, not 
drop by 30-40% percent every time I need to do some tuning and a rolling 
restarts.


> Balancer - data locality drops 30-40% after each cluster-wide rolling 
> restart, not migrating regions back to original regionservers?
> 
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling 
> restarts, and that it must not be balancing the regions back to their 
> original RegionServers.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServers and data locality should return back to around where it was, 
> not drop by 30-40% percent every time I need to do some tuning and a rolling 
> restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568474#comment-16568474
 ] 

Hari Sekhon edited comment on HBASE-21006 at 8/3/18 4:58 PM:
-

Data locality % improves very, very slowly over the course of days after 
rolling restarts, eg. a few percent a day until it comes back up to around 90+%.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up by moving regions back where they started after a rolling 
restart across the cluster.


was (Author: harisekhon):
Data locality % improves very, very slowly over the course of days after 
rolling restarts, eg. a few percent a day until it comes back up to around 90+%.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster rolling restart, not migrating regions back to original regionservers

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% after each cluster rolling 
restart, not migrating regions back to original regionservers  (was: Balancer - 
data locality drops 30-40% after rolling restarts on cluster, not migrating 
regions back to original regionservers)

> Balancer - data locality drops 30-40% after each cluster rolling restart, not 
> migrating regions back to original regionservers
> --
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after each cluster-wide rolling restart, not migrating regions back to original regionservers

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% after each cluster-wide 
rolling restart, not migrating regions back to original regionservers  (was: 
Balancer - data locality drops 30-40% after each cluster rolling restart, not 
migrating regions back to original regionservers)

> Balancer - data locality drops 30-40% after each cluster-wide rolling 
> restart, not migrating regions back to original regionservers
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21006) Balancer - data locality drops 30-40% after rolling restarts on cluster, not migrating regions back to original regionservers

2018-08-03 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-21006:

Summary: Balancer - data locality drops 30-40% after rolling restarts on 
cluster, not migrating regions back to original regionservers  (was: Balancer - 
data locality drops hugely after rolling restarts on cluster, not factoring in 
data locality enough?)

> Balancer - data locality drops 30-40% after rolling restarts on cluster, not 
> migrating regions back to original regionservers
> -
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568474#comment-16568474
 ] 

Hari Sekhon edited comment on HBASE-21006 at 8/3/18 4:57 PM:
-

Data locality % improves very, very slowly over the course of days after 
rolling restarts, eg. a few percent a day until it comes back up to around 90+%.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.


was (Author: harisekhon):
Data locality % improves very, very slowly over the course of days after 
rolling restarts, eg. a few percent a day until it comes back up to around 90%.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568474#comment-16568474
 ] 

Hari Sekhon edited comment on HBASE-21006 at 8/3/18 4:57 PM:
-

Data locality % improves very, very slowly over the course of days after 
rolling restarts, eg. a few percent a day until it comes back up to around 90%.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.


was (Author: harisekhon):
Data locality % improves very, very slowly over the course of days after 
rolling restarts.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568474#comment-16568474
 ] 

Hari Sekhon edited comment on HBASE-21006 at 8/3/18 4:56 PM:
-

Data locality % improves very, very slowly over the course of days after 
rolling restarts.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.


was (Author: harisekhon):
Data locality improves very, very slowly over the course of days after rolling 
restarts.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568474#comment-16568474
 ] 

Hari Sekhon commented on HBASE-21006:
-

Data locality improves very, very slowly over the course of days after rolling 
restarts.

I suspect this is due to minor compactions re-writing blocks locally and not 
due to region migrations, which would mean that the balancer isn't optimising 
data locality back up.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-18164) Much faster locality cost function and candidate generator

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-18164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568473#comment-16568473
 ] 

Hari Sekhon commented on HBASE-18164:
-

Has anyone tested if this results in faster convergence of optimum region 
placement for data locality after rolling restarts?

In enterprise HDP 2.6 still ships HBase 1.1.2 and I find it takes a very long 
time for data locality % to recover after dropping 30-40% after each rolling 
restart of the cluster.

When I say very long time, I mean days of gradually very slowly improving data 
locality (which I think is probably caused more by minor compactions than 
region migrations).

I've linked HBASE-21006 to cover poor locality after rolling restarts.

> Much faster locality cost function and candidate generator
> --
>
> Key: HBASE-18164
> URL: https://issues.apache.org/jira/browse/HBASE-18164
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Reporter: Kahlil Oppenheimer
>Assignee: Kahlil Oppenheimer
>Priority: Critical
> Fix For: 1.4.0, 2.0.0-alpha-2, 2.0.0
>
> Attachments: 18164.branch-1.addendum.txt, HBASE-18164-00.patch, 
> HBASE-18164-01.patch, HBASE-18164-02.patch, HBASE-18164-04.patch, 
> HBASE-18164-05.patch, HBASE-18164-06.patch, HBASE-18164-07.patch, 
> HBASE-18164-08.patch
>
>
> We noticed that during the stochastic load balancer was not scaling well with 
> cluster size. That is to say that on our smaller clusters (~17 tables, ~12 
> region servers, ~5k regions), the balancer considers ~100,000 cluster 
> configurations in 60s per balancer run, but only ~5,000 per 60s on our bigger 
> clusters (~82 tables, ~160 region servers, ~13k regions) .
> Because of this, our bigger clusters are not able to converge on balance as 
> quickly for things like table skew, region load, etc. because the balancer 
> does not have enough time to "think".
> We have re-written the locality cost function to be incremental, meaning it 
> only recomputes cost based on the most recent region move proposed by the 
> balancer, rather than recomputing the cost across all regions/servers every 
> iteration.
> Further, we also cache the locality of every region on every server at the 
> beginning of the balancer's execution for both the LocalityBasedCostFunction 
> and the LocalityCandidateGenerator to reference. This way, they need not 
> collect all HDFS blocks of every region at each iteration of the balancer.
> The changes have been running in all 6 of our production clusters and all 4 
> QA clusters without issue. The speed improvements we noticed are massive. Our 
> big clusters now consider 20x more cluster configurations.
> One design decision I made is to consider locality cost as the difference 
> between the best locality that is possible given the current cluster state, 
> and the currently measured locality. The old locality computation would 
> measure the locality cost as the difference from the current locality and 
> 100% locality, but this new computation instead takes the difference between 
> the current locality for a given region and the best locality for that region 
> in the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568468#comment-16568468
 ] 

Hari Sekhon commented on HBASE-21006:
-

I've linked HBASE-18164 as the Hubspot guys did some excellent work in tuning 
and balancing - I've done an excessive amount of tuning to this HBase OpenTSDB 
cluster over the last few weeks, a lot of it based on their research mixed in 
with a lot of my own.

I don't know if the faster convergence in that ticket would solve this too but 
suspect it might if the cost calculation chooses to shuffle regions to improve 
locality - I don't have access to anywhere near the hardware and load 
comparable to this environment to test it.

> Balancer - data locality drops hugely after rolling restarts on cluster, not 
> factoring in data locality enough?
> ---
>
> Key: HBASE-21006
> URL: https://issues.apache.org/jira/browse/HBASE-21006
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer
>Affects Versions: 1.1.2
> Environment: HDP 2.6
>Reporter: Hari Sekhon
>Priority: Major
>
> After doing rolling restarts of my HBase cluster the data locality drops by 
> 30-40% every time which implies the stochastic balancer is not optimizing for 
> data locality enough, at least not under the circumstance of rolling restarts.
> The stochastic balancer is supposed to take data locality in to account but 
> if this is the case, surely it should move regions back to their original 
> RegionServer and data locality should return back to around where it was, not 
> drop by 30-40% percent every time I need to do some tuning and a rolling 
> restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21006) Balancer - data locality drops hugely after rolling restarts on cluster, not factoring in data locality enough?

2018-08-03 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-21006:
---

 Summary: Balancer - data locality drops hugely after rolling 
restarts on cluster, not factoring in data locality enough?
 Key: HBASE-21006
 URL: https://issues.apache.org/jira/browse/HBASE-21006
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Affects Versions: 1.1.2
 Environment: HDP 2.6
Reporter: Hari Sekhon


After doing rolling restarts of my HBase cluster the data locality drops by 
30-40% every time which implies the stochastic balancer is not optimizing for 
data locality enough, at least not under the circumstance of rolling restarts.

The stochastic balancer is supposed to take data locality in to account but if 
this is the case, surely it should move regions back to their original 
RegionServer and data locality should return back to around where it was, not 
drop by 30-40% percent every time I need to do some tuning and a rolling 
restarts.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-08-01 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Affects Version/s: 1.1.2

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster/RS Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-30 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561777#comment-16561777
 ] 

Hari Sekhon commented on HBASE-20883:
-

Read/Write requests per sec could also be added to RegionServers UI Base Stats 
next to Requests Per Sec which is total, and also to each RegionServer's JMX 
API.

> HMaster/RS Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> -
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster/RS Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-30 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Summary: HMaster/RS Read / Write Requests Per Sec across RegionServers, 
currently only Total Requests Per Sec   (was: HMaster Read / Write Requests Per 
Sec across RegionServers, currently only Total Requests Per Sec )

> HMaster/RS Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> -
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-27 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560029#comment-16560029
 ] 

Hari Sekhon commented on HBASE-20883:
-

I probably won't get round to this any time soon as I solved it quickly by 
writing an external tool before raising this ticket and am on to the next 
thing, I merely raised this as a future improvement to do at some point.

Request breakdowns by region and by regionserver are available along with a 
selection of other HBase tools in my PyTools github repo:

[https://github.com/HariSekhon/pytools]

 

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-27 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559548#comment-16559548
 ] 

Hari Sekhon edited comment on HBASE-20945 at 7/27/18 11:34 AM:
---

Yes but the JMX API already exposes the following information per region in 
RegionServer JMX:
 * compactionsCompletedCount
 * numBytesCompactedCount
 * numFilesCompactedCount

and 12 other metrics per region next to them.

Adding one or two more for timestamp of last compaction started and completed 
seems to be in line with what has already been done and would go from 15 to 17 
metrics per region, so shouldn't break the bank.

For HMaster you probably only want a summary JMX field per table similar to 
what is shown in the UI under table.jsp for whether a table is currently 
compacting:
{code:java}
Table Attributes
Attribute Name  Value   Description
Enabled trueIs the table enabled
Compaction  MINOR   Is the table compacting{code}
Adding a field next to Compaction showing Time of Last Compaction is in line 
with what has already been done.


was (Author: harisekhon):
Yes but the JMX API already exposes the following information per region in 
RegionServer JMX:
 * compactionsCompletedCount
 * numBytesCompactedCount
 * numFilesCompactedCount

and 12 other metrics per region next to them.

Adding one or two more for fields for timestamp of last compaction started and 
completed seems to be in line with what has already been done.

For HMaster you probably only want a summary JMX field per table similar to 
what is shown in the UI under table.jsp for whether a table is currently 
compacting:
{code:java}
Table Attributes
Attribute Name  Value   Description
Enabled trueIs the table enabled
Compaction  MINOR   Is the table compacting{code}
Adding a field next to Compaction showing Time of Last Compaction is in line 
with what has already been done.

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-27 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559548#comment-16559548
 ] 

Hari Sekhon edited comment on HBASE-20945 at 7/27/18 10:38 AM:
---

Yes but the JMX API already exposes the following information per region in 
RegionServer JMX:
 * compactionsCompletedCount
 * numBytesCompactedCount
 * numFilesCompactedCount

and 12 other metrics per region next to them.

Adding one or two more for fields for timestamp of last compaction started and 
completed seems to be in line with what has already been done.

For HMaster you probably only want a summary JMX field per table similar to 
what is shown in the UI under table.jsp for whether a table is currently 
compacting:
{code:java}
Table Attributes
Attribute Name  Value   Description
Enabled trueIs the table enabled
Compaction  MINOR   Is the table compacting{code}
Adding a field next to Compaction showing Time of Last Compaction is in line 
with what has already been done.


was (Author: harisekhon):
Yes but the JMX API already exposes the following information per region in 
RegionServer JMX:
 * compactionsCompletedCount
 * numBytesCompactedCount
 * numFilesCompactedCount

and 12 other metrics per region next to them.

Adding one or two more for fields for timestamp of last compaction started and 
completed seems to be in line with what has already been done.

For HMaster you probably only want a summary JMX field per table similar to 
what is shown in the UI under table.jsp for whether a table is currently 
compacting:
{code:java}
Table Attributes
Attribute Name  Value   Description
Enabled trueIs the table enabled
Compaction  MINOR   Is the table compacting{code}
Adding a field next to Compaction showing Time of Last Compaction is in line 
with what has already been done.

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-27 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559548#comment-16559548
 ] 

Hari Sekhon commented on HBASE-20945:
-

Yes but the JMX API already exposes the following information per region in 
RegionServer JMX:
 * compactionsCompletedCount
 * numBytesCompactedCount
 * numFilesCompactedCount

and 12 other metrics per region next to them.

Adding one or two more for fields for timestamp of last compaction started and 
completed seems to be in line with what has already been done.

For HMaster you probably only want a summary JMX field per table similar to 
what is shown in the UI under table.jsp for whether a table is currently 
compacting:
{code:java}
Table Attributes
Attribute Name  Value   Description
Enabled trueIs the table enabled
Compaction  MINOR   Is the table compacting{code}
Adding a field next to Compaction showing Time of Last Compaction is in line 
with what has already been done.

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike will be there even though major compaction did not in fact 
> happen/complete.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI & JMX - warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Summary: HMaster UI & JMX - warn if Major Compaction hasn't occurred after 
hbase.hregion.majorcompaction time  (was: HMaster UI / JMX - warn if Major 
Compaction hasn't occurred after hbase.hregion.majorcompaction time)

> HMaster UI & JMX - warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, UI
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI + a JMX field should warn if Major Compaction hasn't occurred and 
> completed successfully after the scheduling interval 
> hbase.hregion.majorcompaction (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20948) HBase Master JMX - expose field equivalent to HMaster UI - "Is the table compacting"

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20948:

Summary: HBase Master JMX - expose field equivalent to HMaster UI - "Is the 
table compacting"  (was: HBase Master JMX - expose field equivalent to HMaster 
UI's "Is table compacting")

> HBase Master JMX - expose field equivalent to HMaster UI - "Is the table 
> compacting"
> 
>
> Key: HBASE-20948
> URL: https://issues.apache.org/jira/browse/HBASE-20948
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, monitoring
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Exposed JMX field with the same info as HMaster UI's "Is the table 
> compacting":
>  
> {code:java}
> Table Attributes
> Attribute Name  Value Description
> Enabled true  Is the table enabled
> Compaction  MINOR Is the table compacting{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20948) HBase Master JMX - expose field equivalent to HMaster UI's "Is table compacting"

2018-07-26 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20948:
---

 Summary: HBase Master JMX - expose field equivalent to HMaster 
UI's "Is table compacting"
 Key: HBASE-20948
 URL: https://issues.apache.org/jira/browse/HBASE-20948
 Project: HBase
  Issue Type: Improvement
  Components: API, master, metrics, monitoring
Affects Versions: 1.1.2
Reporter: Hari Sekhon


Exposed JMX field with the same info as HMaster UI's "Is the table compacting":

 
{code:java}
Table Attributes
Attribute NameValue Description
Enabled   true  Is the table enabled
CompactionMINOR Is the table compacting{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20947) HMaster UI - expose JMX timestamps of last Major Compaction

2018-07-26 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20947:
---

 Summary: HMaster UI - expose JMX timestamps of last Major 
Compaction
 Key: HBASE-20947
 URL: https://issues.apache.org/jira/browse/HBASE-20947
 Project: HBase
  Issue Type: Improvement
  Components: Compaction, master, monitoring, UI
Affects Versions: 1.1.2
Reporter: Hari Sekhon


Exposed last timestamp of last major compaction in the HMaster UI alongside the 
Compaction "is the table compacting field" per table.

See HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Description: 
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

Ideally expose this JMX info in the HMaster UI as well.

  was:
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

This information would also be useful to be shown in the HMaster UI, but it's 
more important to be in the JMX for integration with lots of other tools.


> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all 

[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Description: 
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

  was:
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

Ideally expose this JMX info in the HMaster UI as well.


> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow 

[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Description: 
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

This information would also be useful to be shown in the HMaster UI, but it's 
more important to be in the JMX for integration with lots of other tools.

  was:
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully


> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more 

[jira] [Updated] (HBASE-20946) HMaster UI / JMX - warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Description: 
HMaster UI + a JMX field should warn if Major Compaction hasn't occurred and 
completed successfully after the scheduling interval 
hbase.hregion.majorcompaction (default 7 days).

Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
and result in read amplification over time. HMaster UI should monitor time 
since last major compaction completed successfully isn't greater than 
hbase.hregion.majorcompaction (plus some fudge factor).

If hbase.hregion.majorcompaction is zero then disable the check, as external 
tooling will trigger major compactions off peak and should instead monitor 
against JMX timestamp of last major compaction as per HBASE-20945.

  was:
HMaster UI should warn if Major Compaction hasn't occurred and completed 
successfully after the scheduling interval hbase.hregion.majorcompaction 
(default 7 days).

Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
and result in read amplification over time. HMaster UI should monitor time 
since last major compaction completed successfully isn't greater than 
hbase.hregion.majorcompaction (plus some fudge factor).

If hbase.hregion.majorcompaction is zero then disable the check, as external 
tooling will trigger major compactions off peak and should instead monitor 
against JMX timestamp of last major compaction as per HBASE-20945.


> HMaster UI / JMX - warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, UI
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI + a JMX field should warn if Major Compaction hasn't occurred and 
> completed successfully after the scheduling interval 
> hbase.hregion.majorcompaction (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI / JMX - warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Component/s: UI
 API

> HMaster UI / JMX - warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, UI
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI should warn if Major Compaction hasn't occurred and completed 
> successfully after the scheduling interval hbase.hregion.majorcompaction 
> (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Component/s: monitoring
 metrics
 master
 Compaction

> HMaster UI warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, master, metrics, monitoring
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI should warn if Major Compaction hasn't occurred and completed 
> successfully after the scheduling interval hbase.hregion.majorcompaction 
> (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI / JMX - warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Summary: HMaster UI / JMX - warn if Major Compaction hasn't occurred after 
hbase.hregion.majorcompaction time  (was: HMaster UI warn if Major Compaction 
hasn't occurred after hbase.hregion.majorcompaction time)

> HMaster UI / JMX - warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction, master, metrics, monitoring
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI should warn if Major Compaction hasn't occurred and completed 
> successfully after the scheduling interval hbase.hregion.majorcompaction 
> (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Description: 
HMaster UI should warn if Major Compaction hasn't occurred and completed 
successfully after the scheduling interval hbase.hregion.majorcompaction 
(default 7 days).

Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
and result in read amplification over time. HMaster UI should monitor time 
since last major compaction completed successfully isn't greater than 
hbase.hregion.majorcompaction (plus some fudge factor).

If hbase.hregion.majorcompaction is zero then disable the check, as external 
tooling will trigger major compactions off peak and should instead monitor 
against JMX timestamp of last major compaction as per HBASE-20945.

  was:
HMaster UI should warn if Major Compaction hasn't occurred and completed 
successfully within hbase.hregion.majorcompaction time (default 7 days).

Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
and result in read amplification over time. HMaster UI should monitor time 
since last major compaction completed successfully isn't greater than 
hbase.hregion.majorcompaction (plus some fudge factor).

If hbase.hregion.majorcompaction is zero then disable the check, as external 
tooling will trigger major compactions off peak and should instead monitor 
against JMX timestamp of last major compaction as per HBASE-20945.


> HMaster UI warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI should warn if Major Compaction hasn't occurred and completed 
> successfully after the scheduling interval hbase.hregion.majorcompaction 
> (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20946) HMaster UI warn if Major Compaction hasn't occurred after hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20946:

Summary: HMaster UI warn if Major Compaction hasn't occurred after 
hbase.hregion.majorcompaction time  (was: HMaster UI warn if Major Compaction 
hasn't occurred + completed successfully within hbase.hregion.majorcompaction 
time)

> HMaster UI warn if Major Compaction hasn't occurred after 
> hbase.hregion.majorcompaction time
> 
>
> Key: HBASE-20946
> URL: https://issues.apache.org/jira/browse/HBASE-20946
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0, 1.1.2, 2.2.0
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI should warn if Major Compaction hasn't occurred and completed 
> successfully within hbase.hregion.majorcompaction time (default 7 days).
> Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
> and result in read amplification over time. HMaster UI should monitor time 
> since last major compaction completed successfully isn't greater than 
> hbase.hregion.majorcompaction (plus some fudge factor).
> If hbase.hregion.majorcompaction is zero then disable the check, as external 
> tooling will trigger major compactions off peak and should instead monitor 
> against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20946) HMaster UI warn if Major Compaction hasn't occurred + completed successfully within hbase.hregion.majorcompaction time

2018-07-26 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20946:
---

 Summary: HMaster UI warn if Major Compaction hasn't occurred + 
completed successfully within hbase.hregion.majorcompaction time
 Key: HBASE-20946
 URL: https://issues.apache.org/jira/browse/HBASE-20946
 Project: HBase
  Issue Type: Improvement
Affects Versions: 1.1.2, 3.0.0, 2.2.0
Reporter: Hari Sekhon


HMaster UI should warn if Major Compaction hasn't occurred and completed 
successfully within hbase.hregion.majorcompaction time (default 7 days).

Recent changes in HBase HBASE-6028 mean major compactions could be cancelled 
and result in read amplification over time. HMaster UI should monitor time 
since last major compaction completed successfully isn't greater than 
hbase.hregion.majorcompaction (plus some fudge factor).

If hbase.hregion.majorcompaction is zero then disable the check, as external 
tooling will trigger major compactions off peak and should instead monitor 
against JMX timestamp of last major compaction as per HBASE-20945.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Description: 
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes to allow compactions to be 
cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
compaction may not have happened if it got cancelled, so the compaction queue 
spike will be there even though major compaction did not in fact 
happen/complete.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully

  was:
Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes allow compactions to be cancelled, 
the queue length doesn't tell the whole story as the compaction may not have 
happened if it got cancelled.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully


> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes to allow compactions to be 
> cancelled in HBASE-6028, the queue length doesn't tell the whole story as the 
> compaction may not have happened if it got cancelled, so the compaction queue 
> spike 

[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Component/s: tooling
 regionserver
 monitoring
 metrics
 master
 Compaction
 API

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>  Components: API, Compaction, master, metrics, monitoring, 
> regionserver, tooling
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes allow compactions to be 
> cancelled, the queue length doesn't tell the whole story as the compaction 
> may not have happened if it got cancelled.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Summary: HBase JMX - timestamp of last Major Compaction (started, completed 
successfully)  (was: HBase JMX - timestamp of last major compaction (started, 
completed successfully))

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes allow compactions to be 
> cancelled, the queue length doesn't tell the whole story as the compaction 
> may not have happened if it got cancelled.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last Major Compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Affects Version/s: 1.1.2

> HBase JMX - timestamp of last Major Compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes allow compactions to be 
> cancelled, the queue length doesn't tell the whole story as the compaction 
> may not have happened if it got cancelled.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20945) HBase JMX - timestamp of last major compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558113#comment-16558113
 ] 

Hari Sekhon commented on HBASE-20945:
-

I'm also seeing a period of over 2 weeks between 7th and 24th July without 
spikes in compaction queue size which may mean that the compaction doesn't 
impact the queue length noticeably as on this cluster it is in fact scheduled 
to happen every 7 days as per default. Having those last major compaction 
timestamps would help in monitoring integrations to ensure that major 
compaction has actually happened within the last 7 days.

> HBase JMX - timestamp of last major compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes allow compactions to be 
> cancelled, the queue length doesn't tell the whole story as the compaction 
> may not have happened if it got cancelled.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20945) HBase JMX - timestamp of last major compaction (started, completed successfully)

2018-07-26 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20945:

Summary: HBase JMX - timestamp of last major compaction (started, completed 
successfully)  (was: HBase JMX - time of last major compaction)

> HBase JMX - timestamp of last major compaction (started, completed 
> successfully)
> 
>
> Key: HBASE-20945
> URL: https://issues.apache.org/jira/browse/HBASE-20945
> Project: HBase
>  Issue Type: Improvement
>Reporter: Hari Sekhon
>Priority: Major
>
> Request that the timestamp of the last major compaction be stored in JMX API 
> available at /jmx.
> Major Compactions may be disabled to better control scheduling to trigger off 
> peak (this is an old school recommendation), but there is a risk that the 
> major compaction doesn't happen in that case. Also people may trigger major 
> compactions manually and it's hard to see that (I've looked at graphs of 
> storefile counts where it's not obvious but I can infer it from spikes in 
> compaction queue length). Storing the last timestamps would allow all sorts 
> of scripting checks against the API much more simply than trying to infer it 
> from changes in graphs. Also with recent changes allow compactions to be 
> cancelled, the queue length doesn't tell the whole story as the compaction 
> may not have happened if it got cancelled.
> Since major compactions may take hours and can also now be cancelled in the 
> latest versions of HBase, we need a few different fields added to JMX:
>  * HBase Master JMX:
>  ** timestamp that last major compaction was triggered, either manually via 
> major_compact command or via schedule
>  ** timestamp that last major compaction completed successfully (since 
> timestamp above could have been started and then later cancelled manually if 
> load was too high)
>  * HBase Regionserver JMX:
>  ** timestamp per region that last major compaction was triggered (there are 
> already compcationsCompletedCount, numBytesCompactedCount and 
> numFilesCompactedCount so it makes sense to add this next to those for each 
> region)
>  ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20945) HBase JMX - time of last major compaction

2018-07-26 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20945:
---

 Summary: HBase JMX - time of last major compaction
 Key: HBASE-20945
 URL: https://issues.apache.org/jira/browse/HBASE-20945
 Project: HBase
  Issue Type: Improvement
Reporter: Hari Sekhon


Request that the timestamp of the last major compaction be stored in JMX API 
available at /jmx.

Major Compactions may be disabled to better control scheduling to trigger off 
peak (this is an old school recommendation), but there is a risk that the major 
compaction doesn't happen in that case. Also people may trigger major 
compactions manually and it's hard to see that (I've looked at graphs of 
storefile counts where it's not obvious but I can infer it from spikes in 
compaction queue length). Storing the last timestamps would allow all sorts of 
scripting checks against the API much more simply than trying to infer it from 
changes in graphs. Also with recent changes allow compactions to be cancelled, 
the queue length doesn't tell the whole story as the compaction may not have 
happened if it got cancelled.

Since major compactions may take hours and can also now be cancelled in the 
latest versions of HBase, we need a few different fields added to JMX:
 * HBase Master JMX:
 ** timestamp that last major compaction was triggered, either manually via 
major_compact command or via schedule
 ** timestamp that last major compaction completed successfully (since 
timestamp above could have been started and then later cancelled manually if 
load was too high)
 * HBase Regionserver JMX:
 ** timestamp per region that last major compaction was triggered (there are 
already compcationsCompletedCount, numBytesCompactedCount and 
numFilesCompactedCount so it makes sense to add this next to those for each 
region)
 ** timestamp per region that last major compaction completed successfully



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-24 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553998#comment-16553998
 ] 

Hari Sekhon commented on HBASE-20883:
-

I agree with you [~apurtell]... for the larger clusters they will need more 
tooling anyway, but a lot of customers these days are running smaller clusters 
of higher performance/density and tuning them to make use of the previously 
overspec'd enterprise hardware (that's what I'm doing right now in fact).

it's surprising what one can do with a dozen or two high spec machines these 
days, so this would be of benefit to a lot of these middle users who aren't 
internet giants, which seem to make up more of the user base in the last few 
years than the bigger name internet companies of old.

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-24 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553998#comment-16553998
 ] 

Hari Sekhon edited comment on HBASE-20883 at 7/24/18 9:24 AM:
--

I agree with you [~apurtell]... for the larger clusters they will need more 
tooling anyway, but a lot of customers these days are running smaller clusters 
of higher performance/density and tuning them to make use of the previously 
overspec'd enterprise hardware (that's what I'm doing right now in fact).

It's surprising what one can do with a dozen or two high spec machines these 
days, so this would be of benefit to a lot of these middle users who aren't 
internet giants, which seem to make up more of the user base in the last few 
years than the bigger name internet companies of old.


was (Author: harisekhon):
I agree with you [~apurtell]... for the larger clusters they will need more 
tooling anyway, but a lot of customers these days are running smaller clusters 
of higher performance/density and tuning them to make use of the previously 
overspec'd enterprise hardware (that's what I'm doing right now in fact).

it's surprising what one can do with a dozen or two high spec machines these 
days, so this would be of benefit to a lot of these middle users who aren't 
internet giants, which seem to make up more of the user base in the last few 
years than the bigger name internet companies of old.

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20904) Prometheus /metrics http endpoint for monitoring integration

2018-07-20 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20904:

Component/s: metrics

> Prometheus /metrics http endpoint for monitoring integration
> 
>
> Key: HBASE-20904
> URL: https://issues.apache.org/jira/browse/HBASE-20904
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics, monitoring
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to add Prometheus /metrics http endpoint for monitoring 
> integration:
> [https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E]
> Prometheus metrics format for that endpoint:
> [https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20904) Prometheus /metrics http endpoint for monitoring integration

2018-07-20 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20904:

Component/s: monitoring

> Prometheus /metrics http endpoint for monitoring integration
> 
>
> Key: HBASE-20904
> URL: https://issues.apache.org/jira/browse/HBASE-20904
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to add Prometheus /metrics http endpoint for monitoring 
> integration:
> [https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E]
> Prometheus metrics format for that endpoint:
> [https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20904) Prometheus /metrics http endpoint for monitoring integration

2018-07-19 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20904:

Description: 
Feature Request to add Prometheus /metrics http endpoint for monitoring 
integration:

[https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E]

Prometheus metrics format for that endpoint:

[https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md]

 

  was:
Feature Request to add Prometheus /metrics http endpoint for monitoring 
integration:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E


> Prometheus /metrics http endpoint for monitoring integration
> 
>
> Key: HBASE-20904
> URL: https://issues.apache.org/jira/browse/HBASE-20904
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hari Sekhon
>Priority: Major
>
> Feature Request to add Prometheus /metrics http endpoint for monitoring 
> integration:
> [https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E]
> Prometheus metrics format for that endpoint:
> [https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20904) Prometheus /metrics http endpoint for monitoring integration

2018-07-17 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20904:
---

 Summary: Prometheus /metrics http endpoint for monitoring 
integration
 Key: HBASE-20904
 URL: https://issues.apache.org/jira/browse/HBASE-20904
 Project: HBase
  Issue Type: New Feature
Reporter: Hari Sekhon


Feature Request to add Prometheus /metrics http endpoint for monitoring 
integration:

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cscrape_config%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-17 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16546211#comment-16546211
 ] 

Hari Sekhon commented on HBASE-20883:
-

Ok. Would it hurt to also expose that information in the UI as well given that 
Total Requests Per Sec are already there and it's only 2 extra columns next to 
it with similarly small info? There is already a lot of info on the HMaster UI 
so I can't imagine this would break the camel's back.

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-16 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545231#comment-16545231
 ] 

Hari Sekhon commented on HBASE-20883:
-

[~andrewcheng]  thanks for mentioning the other ticket but it's not exactly the 
same issue.

That asks to use a more accurate counting to account for multi requests.

I'm just asking that the Read + Writes Requests Per Sec are shown in the UI 
next to each RegionServer which already shows the Total Requests Per Sec, to be 
able to detect Read or Write skew more easily.

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-16 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16545229#comment-16545229
 ] 

Hari Sekhon commented on HBASE-20883:
-

{quote}This won't scale
{quote}
HMaster UI already shows Total Requests Per Sec next to each RegionServer, 
which I think is already calculated from readRequestCount + writeRequestCount 
or totalRequestCount differentials. It's just two more columns to expose that 
information in the existing table.

I already have OpenTSDB but it's handy for some tools and scripts to be able to 
get this information from HBase directly, perhaps you don't want to have to set 
up OpenTSDB on HBase to be able to debug somebody's HBase installation and 
since it appears that HMaster is already collecting and averaging the 
information, it doesn't seem like it hurts to expose that same information in 
JMX.

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-14644) Region in transition metric is broken

2018-07-13 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543016#comment-16543016
 ] 

Hari Sekhon edited comment on HBASE-14644 at 7/13/18 12:51 PM:
---

Did anyone check if ritCountOverThreshold was actually fixed, and not just 
ritCount as I definitely saw ritCountOverThreshold was showing zero while 
HMaster UI showed regions stuck in transition:

See https://issues.apache.org/jira/browse/HBASE-16636?


was (Author: harisekhon):
Did anyone check under the scenario of having regions stuck in transition if 
ritCountOverThreshold was actually fixed as documented in 
https://issues.apache.org/jira/browse/HBASE-16636?

> Region in transition metric is broken
> -
>
> Key: HBASE-14644
> URL: https://issues.apache.org/jira/browse/HBASE-14644
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: huaxiang sun
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 2.0.0
>
> Attachments: HBASE-14644-v001.patch, HBASE-14644-v002-addendum.patch, 
> HBASE-14644-v002.patch, HBASE-14644-v002.patch, branch-1.diff
>
>
> ritCount stays 0 no matter what



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-14644) Region in transition metric is broken

2018-07-13 Thread Hari Sekhon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-14644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16543016#comment-16543016
 ] 

Hari Sekhon commented on HBASE-14644:
-

Did anyone check under the scenario of having regions stuck in transition if 
ritCountOverThreshold was actually fixed as documented in 
https://issues.apache.org/jira/browse/HBASE-16636?

> Region in transition metric is broken
> -
>
> Key: HBASE-14644
> URL: https://issues.apache.org/jira/browse/HBASE-14644
> Project: HBase
>  Issue Type: Bug
>Reporter: Elliott Clark
>Assignee: huaxiang sun
>Priority: Major
> Fix For: 1.3.0, 1.2.2, 2.0.0
>
> Attachments: HBASE-14644-v001.patch, HBASE-14644-v002-addendum.patch, 
> HBASE-14644-v002.patch, HBASE-14644-v002.patch, branch-1.diff
>
>
> ritCount stays 0 no matter what



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Description: 
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
JMX read/write/total request counts but since HMaster is collecting this info 
anyway it shouldn't be a big change to improve it to also show Reads / Writes 
Per Sec as well as Total.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also Per Region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.

  was:
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
JMX read/write/total request counts but since HMaster is collecting this info 
anyway it shouldn't be a big change to improve it to also show Reads / Writes 
Per Sec as well as Total.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.


> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also Per Region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Description: 
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
JMX read/write/total request counts but since HMaster is collecting this info 
anyway it shouldn't be a big change to improve it to also show Reads / Writes 
Per Sec as well as Total.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.

  was:
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
JMX read/write/total request counts but since HMaster is collecting this info 
anyway it shouldn't be a big change to improve it to also show Reads / Writes 
Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.


> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec as well as Total.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Description: 
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
JMX read/write/total request counts but since HMaster is collecting this info 
anyway it shouldn't be a big change to improve it to also show Reads / Writes 
Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.

  was:
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.


> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> JMX read/write/total request counts but since HMaster is collecting this info 
> anyway it shouldn't be a big change to improve it to also show Reads / Writes 
> Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Description: 
HMaster currently shows Requests Per Second per RegionServer under HMaster UI's 
/master-status page -> Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.

  was:
HMaster UI currently shows Requests Per Second per RegionServer under 
/master-status Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.


> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster currently shows Requests Per Second per RegionServer under HMaster 
> UI's /master-status page -> Region Servers -> Base Stats section in the Web 
> UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> but since HMaster is collecting this info anyway it shouldn't be a big change 
> to improve it to also show Reads / Writes Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read+Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Summary: HMaster Read+Write Requests Per Sec across RegionServers, 
currently only Total Requests Per Sec  (was: HMaster UI/JMX Read+Write Requests 
per sec across RegionServers)

> HMaster Read+Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec
> ---
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI currently shows Requests Per Second per RegionServer under 
> /mater-status Region Servers -> Base Stats section in the Web UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> but since HMaster is collecting this info anyway it shouldn't be a big change 
> to improve it to also show Reads / Writes Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Summary: HMaster Read / Write Requests Per Sec across RegionServers, 
currently only Total Requests Per Sec   (was: HMaster Read+Write Requests Per 
Sec across RegionServers, currently only Total Requests Per Sec)

> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI currently shows Requests Per Second per RegionServer under 
> /mater-status Region Servers -> Base Stats section in the Web UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> but since HMaster is collecting this info anyway it shouldn't be a big change 
> to improve it to also show Reads / Writes Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster Read / Write Requests Per Sec across RegionServers, currently only Total Requests Per Sec

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Description: 
HMaster UI currently shows Requests Per Second per RegionServer under 
/master-status Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.

  was:
HMaster UI currently shows Requests Per Second per RegionServer under 
/mater-status Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.


> HMaster Read / Write Requests Per Sec across RegionServers, currently only 
> Total Requests Per Sec 
> --
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI currently shows Requests Per Second per RegionServer under 
> /master-status Region Servers -> Base Stats section in the Web UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> but since HMaster is collecting this info anyway it shouldn't be a big change 
> to improve it to also show Reads / Writes Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20883) HMaster UI Read+Write Requests per sec across RegionServers

2018-07-13 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20883:
---

 Summary: HMaster UI Read+Write Requests per sec across 
RegionServers
 Key: HBASE-20883
 URL: https://issues.apache.org/jira/browse/HBASE-20883
 Project: HBase
  Issue Type: Improvement
  Components: Admin, master, metrics, monitoring, UI, Usability
Affects Versions: 1.1.2
Reporter: Hari Sekhon


HMaster UI currently shows Requests Per Second per RegionServer under 
/mater-status Region Servers -> Base Stats section in the Web UI.

Please add Reads Per Second and Writes Per Second per RegionServer alongside 
this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
information in the HMaster JMX API.

This will make it easier to find read or write hotspotting on HBase as a 
combined total will minimize and mask differences between RegionServers. For 
example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
so write skew will be masked as it won't show enough significant difference in 
the much larger combined Total Requests Per Second stat.

For now I've written a Python tool to calculate this info from RegionServers 
but since HMaster is collecting this info anyway it shouldn't be a big change 
to improve it to also show Reads / Writes Per Sec.

Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
and also per region at my [PyTools github 
repo|https://github.com/harisekhon/pytools] along with a selection of other 
HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20883) HMaster UI/JMX Read+Write Requests per sec across RegionServers

2018-07-13 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20883:

Summary: HMaster UI/JMX Read+Write Requests per sec across RegionServers  
(was: HMaster UI Read+Write Requests per sec across RegionServers)

> HMaster UI/JMX Read+Write Requests per sec across RegionServers
> ---
>
> Key: HBASE-20883
> URL: https://issues.apache.org/jira/browse/HBASE-20883
> Project: HBase
>  Issue Type: Improvement
>  Components: Admin, master, metrics, monitoring, UI, Usability
>Affects Versions: 1.1.2
>Reporter: Hari Sekhon
>Priority: Major
>
> HMaster UI currently shows Requests Per Second per RegionServer under 
> /mater-status Region Servers -> Base Stats section in the Web UI.
> Please add Reads Per Second and Writes Per Second per RegionServer alongside 
> this in the HMaster UI, and also expose the Read/Write/Total requests per sec 
> information in the HMaster JMX API.
> This will make it easier to find read or write hotspotting on HBase as a 
> combined total will minimize and mask differences between RegionServers. For 
> example, we do 30,000 reads/sec but only 900 writes/sec to each RegionServer, 
> so write skew will be masked as it won't show enough significant difference 
> in the much larger combined Total Requests Per Second stat.
> For now I've written a Python tool to calculate this info from RegionServers 
> but since HMaster is collecting this info anyway it shouldn't be a big change 
> to improve it to also show Reads / Writes Per Sec.
> Find my tools for more granular Read/Write Requests Per Sec Per Regionserver 
> and also per region at my [PyTools github 
> repo|https://github.com/harisekhon/pytools] along with a selection of other 
> HBase tools I've used for performance debugging over the years.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2018-07-09 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-16636:

Priority: Major  (was: Minor)

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Major
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20857) JMX - add Balancer status = enabled / disabled

2018-07-06 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20857:

Summary: JMX - add Balancer status = enabled / disabled  (was: JMX - add 
Balancer enabled/disabled status)

> JMX - add Balancer status = enabled / disabled
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Priority: Major
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-20857) JMX - add Balancer enabled/disabled status

2018-07-06 Thread Hari Sekhon (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-20857:

Description: 
Add HBase Balancer enabled/disabled status to JMX API on HMaster.

Right now the HMaster will give a warning near the top of HMaster UI if 
balancer is disabled, but scraping this is for monitoring integration is not 
nice, it should be available in JMX API as there is already a 
Master,sub=Balancer bean with metrics for the balancer ops etc.

  was:
Add HBase Balancer enabled/disabled status to JMX API on HMaster.

Right now the HMaster will give a pop up warning if balancer is disabled, but 
scraping this is not nice, it should be available in JMX API as there is 
already a Master,sub=Balancer bean with metrics for the balancer ops etc.


> JMX - add Balancer enabled/disabled status
> --
>
> Key: HBASE-20857
> URL: https://issues.apache.org/jira/browse/HBASE-20857
> Project: HBase
>  Issue Type: Improvement
>  Components: API, master, metrics, REST, tooling, Usability
>Reporter: Hari Sekhon
>Priority: Major
>
> Add HBase Balancer enabled/disabled status to JMX API on HMaster.
> Right now the HMaster will give a warning near the top of HMaster UI if 
> balancer is disabled, but scraping this is for monitoring integration is not 
> nice, it should be available in JMX API as there is already a 
> Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20857) JMX - add Balancer enabled/disabled status

2018-07-06 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-20857:
---

 Summary: JMX - add Balancer enabled/disabled status
 Key: HBASE-20857
 URL: https://issues.apache.org/jira/browse/HBASE-20857
 Project: HBase
  Issue Type: Improvement
  Components: API, master, metrics, REST, tooling, Usability
Reporter: Hari Sekhon


Add HBase Balancer enabled/disabled status to JMX API on HMaster.

Right now the HMaster will give a pop up warning if balancer is disabled, but 
scraping this is not nice, it should be available in JMX API as there is 
already a Master,sub=Balancer bean with metrics for the balancer ops etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2016-09-21 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509629#comment-15509629
 ] 

Hari Sekhon edited comment on HBASE-16636 at 9/21/16 11:30 AM:
---

I've also noticed that ritCount and ritOldestAge was zero as well. Are all 3 of 
these metrics fixed in the other jira?


was (Author: harisekhon):
I've also noticed that ritOldestAge was zero as well. Are all 3 of these 
metrics fixed in the other jira?

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Minor
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2016-09-21 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15509629#comment-15509629
 ] 

Hari Sekhon commented on HBASE-16636:
-

I've also noticed that ritOldestAge was zero as well. Are all 3 of these 
metrics fixed in the other jira?

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Minor
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2016-09-16 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495649#comment-15495649
 ] 

Hari Sekhon commented on HBASE-16636:
-

Hi [~huaxiang], it is related although I've noticed that both the regions in 
transition as well as the regions in transition over threshold counts are both 
zero, are both fixed by that other issue?

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Minor
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2016-09-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-16636:

Attachment: ritCountOverThreshold.png
Regions_in_Transition_UI.png

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Minor
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting Regions Stuck in Transition

2016-09-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-16636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-16636:

Summary: Regions in Transition counts wrong (zero) in HMaster /jmx, 
prevents detecting Regions Stuck in Transition  (was: Regions in Transition 
counts wrong (zero) in HMaster /jmx)

> Regions in Transition counts wrong (zero) in HMaster /jmx, prevents detecting 
> Regions Stuck in Transition
> -
>
> Key: HBASE-16636
> URL: https://issues.apache.org/jira/browse/HBASE-16636
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 1.1.2
> Environment: HDP 2.3.2
>Reporter: Hari Sekhon
>Priority: Minor
> Attachments: Regions_in_Transition_UI.png, ritCountOverThreshold.png
>
>
> I've discovered that the Region in Transition counts are wrong in the HMaster 
> UI /jmx page.
> The /master-status page clearly shows 3 regions stuck in transition but the 
> /jmx page I was monitoring reported 0 for ritCountOverThreshold.
> {code}
> }, {
> "name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
> "modelerType" : "Master,sub=AssignmentManger",
> "tag.Context" : "master",
> ...
> "ritOldestAge" : 0,
> "ritCountOverThreshold" : 0,
> ...
> "ritCount" : 0,
> {code}
> I have a nagios plugin I wrote which was checking this which I've since had 
> to rewrite to parse the /master-status page instead (the code is in 
> check_hbase_regions_stuck_in_transition.py at 
> https://github.com/harisekhon/nagios-plugins).
> I'm attaching screenshots of both /master-status and /jmx to show the 
> difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-16636) Regions in Transition counts wrong (zero) in HMaster /jmx

2016-09-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created HBASE-16636:
---

 Summary: Regions in Transition counts wrong (zero) in HMaster /jmx
 Key: HBASE-16636
 URL: https://issues.apache.org/jira/browse/HBASE-16636
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 1.1.2
 Environment: HDP 2.3.2
Reporter: Hari Sekhon
Priority: Minor


I've discovered that the Region in Transition counts are wrong in the HMaster 
UI /jmx page.

The /master-status page clearly shows 3 regions stuck in transition but the 
/jmx page I was monitoring reported 0 for ritCountOverThreshold.

{code}
}, {
"name" : "Hadoop:service=HBase,name=Master,sub=AssignmentManger",
"modelerType" : "Master,sub=AssignmentManger",
"tag.Context" : "master",
...
"ritOldestAge" : 0,
"ritCountOverThreshold" : 0,
...
"ritCount" : 0,
{code}

I have a nagios plugin I wrote which was checking this which I've since had to 
rewrite to parse the /master-status page instead (the code is in 
check_hbase_regions_stuck_in_transition.py at 
https://github.com/harisekhon/nagios-plugins).

I'm attaching screenshots of both /master-status and /jmx to show the 
difference in the 2 pages on the HMaster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions

2016-09-14 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15489762#comment-15489762
 ] 

Hari Sekhon commented on HBASE-6028:


I'm on site with a client that also wants this ability to cancel compactions, 
any chance this will get resolved soon?

If there is something in the API (and preferably also exposed in the HBase 
Shell) to allow it a 'cancel_compaction' command then that would be really, 
really useful.

> Implement a cancel for in-progress compactions
> --
>
> Key: HBASE-6028
> URL: https://issues.apache.org/jira/browse/HBASE-6028
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Derek Wollenstein
>Assignee: Esteban Gutierrez
>Priority: Minor
>  Labels: beginner
>
> Depending on current server load, it can be extremely expensive to run 
> periodic minor / major compactions.  It would be helpful to have a feature 
> where a user could use the shell or a client tool to explicitly cancel an 
> in-progress compactions.  This would allow a system to recover when too many 
> regions became eligible for compactions at once



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-8298) desc tablename shorthand for describe tablename, similar to how databases have

2014-06-09 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon reopened HBASE-8298:



Would still like one of these 2 patches integrated to give the desc command in 
hbase shell.

Anyone who's ever used a database knows 'desc', and it doesn't detract anything 
- people are still able to use the longer form 'describe'.

 desc tablename shorthand for describe tablename, similar to how databases 
 have
 --

 Key: HBASE-8298
 URL: https://issues.apache.org/jira/browse/HBASE-8298
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.2
Reporter: Hari Sekhon
Priority: Trivial
 Attachments: desc.patch, desc2.patch


 It would be nice if you could type
 desc tablename
 in hbase shell as shorthand for
 describe tablename
 similar to how you can in traditional databases.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-8298) desc tablename shorthand for describe tablename, similar to how databases have

2013-06-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-8298:
---

Attachment: desc.patch

Here is the patch for the desc command in HBase shell. I tested on HBase 
0.94.6-cdh4.3.0 and then rebuilt the patch against Apache HBase git trunk just 
now.

 desc tablename shorthand for describe tablename, similar to how databases 
 have
 --

 Key: HBASE-8298
 URL: https://issues.apache.org/jira/browse/HBASE-8298
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.2
Reporter: Hari Sekhon
Priority: Trivial
  Labels: noob
 Attachments: desc.patch


 It would be nice if you could type
 desc tablename
 in hbase shell as shorthand for
 describe tablename
 similar to how you can in traditional databases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8298) desc tablename shorthand for describe tablename, similar to how databases have

2013-06-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated HBASE-8298:
---

Labels:   (was: noob)
Status: Patch Available  (was: Open)

I've attached the patch for the desc command in HBase shell. I tested on HBase 
0.94.6-cdh4.3.0 and then rebuilt the patch against Apache HBase git trunk just 
now.

 desc tablename shorthand for describe tablename, similar to how databases 
 have
 --

 Key: HBASE-8298
 URL: https://issues.apache.org/jira/browse/HBASE-8298
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.2
Reporter: Hari Sekhon
Priority: Trivial
 Attachments: desc.patch


 It would be nice if you could type
 desc tablename
 in hbase shell as shorthand for
 describe tablename
 similar to how you can in traditional databases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8298) desc tablename shorthand for describe tablename, similar to how databases have

2013-06-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684447#comment-13684447
 ] 

Hari Sekhon commented on HBASE-8298:


Yes that would appear in the help command in the ddl command group next to 
describe, although it shouldn't be confusing since desc is the widely known 
common shortcut for describe across various systems and I've intentionally 
inherited from Describe so the help documentation and implementation is exactly 
the same and will inherit any changes made to Describe in the future.

Right now the code wouldn't support a regex without a more significant rewrite 
of the hbase shell. It's all pre-defined command lists which are just 
dynamically eval'ing the command files for the command classes containing the 
actual command method for the action. A regex wouldn't know where to find the 
actual class definitions.

If you leave desc off the pre-defined command list it never loads the desc 
class. Even if you move the Desc class to describe.rb which is loaded you don't 
get the command mapping and end up with an error NoMethodError: undefined 
method...

The command mapping is a class inheritance variable in the base Shell module, 
so the only other reasonably easy way that I can see to do this would be to add 
a special case in the base Shell module, which is slightly more hackish way of 
doing it and not in keeping with the current command framework, although the 
desc command will not show in the ddl group for help, it'll be a stealth 
shortcut.

 desc tablename shorthand for describe tablename, similar to how databases 
 have
 --

 Key: HBASE-8298
 URL: https://issues.apache.org/jira/browse/HBASE-8298
 Project: HBase
  Issue Type: Improvement
  Components: shell
Affects Versions: 0.94.2
Reporter: Hari Sekhon
Priority: Trivial
 Attachments: desc.patch


 It would be nice if you could type
 desc tablename
 in hbase shell as shorthand for
 describe tablename
 similar to how you can in traditional databases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >