[jira] [Comment Edited] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-12 Thread Lei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442
 ] 

Lei Chen edited comment on HBASE-21706 at 1/13/19 2:18 AM:
---

I don't have a running apache hbase cluster right now, but I do see same code 
in apache hbase 1.1.2
 - 
[HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565]
 - 
[MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]
I believe apache master branch also has the same code.

After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" 
), but the new standby (previous active) master will join the rest and have it 
set to "hdfs://DEV-CLUSTER/hbase-root"

I would like to provide a patch if this is indeed an unexpected behavior, but 
could anyone please help me identify some cases where  `fs.defaultFS` from 
standby masters might be used?
Thanks


was (Author: leochen4891):
I don't have a running apache hbase cluster right now, but I do see same code 
in apache hbase 1.1.2
 - 
[HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565]
 - 
[MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I
 believe apache master branch also has the same code.

After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" 
), but the new standby (previous active) master will join the rest and have it 
set to "hdfs://DEV-CLUSTER/hbase-root"

I would like to provide a patch if this is indeed an unexpected behavior, but 
could anyone please help me identify some cases where  `fs.defaultFS` from 
standby masters might be used?
Thanks

> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, however I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick look at the code, the cause seems to be that standby masters got 
> the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-12 Thread Lei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442
 ] 

Lei Chen edited comment on HBASE-21706 at 1/13/19 2:17 AM:
---

I don't have a running apache hbase cluster right now, but I do see same code 
in apache hbase 1.1.2
 - 
[HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565]
 - 
[MasterFileSystem.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I
 believe apache master branch also has the same code.

After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" 
), but the new standby (previous active) master will join the rest and have it 
set to "hdfs://DEV-CLUSTER/hbase-root"

I would like to provide a patch if this is indeed an unexpected behavior, but 
could anyone please help me identify some cases where  `fs.defaultFS` from 
standby masters might be used?
Thanks


was (Author: leochen4891):
I don't have a running apache hbase cluster right now, but I do see same code 
in apache hbase 1.1.2
 - 
[HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565]
 - [MasterFileSystem.java
|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I
 believe apache master branch also has the same code.

After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" 
), but the new standby (previous active) master will join the rest and have it 
set to "hdfs://DEV-CLUSTER/hbase-root"

I would like to provide a patch if this is indeed an unexpected behavior, but 
could anyone please help me identify some cases where  `fs.defaultFS` from 
standby masters might be used?
Thanks

> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, however I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick look at the code, the cause seems to be that standby masters got 
> the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-12 Thread Lei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741442#comment-16741442
 ] 

Lei Chen commented on HBASE-21706:
--

I don't have a running apache hbase cluster right now, but I do see same code 
in apache hbase 1.1.2
 - 
[HRegionServer.java|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L565]
 - [MasterFileSystem.java
|https://github.com/apache/hbase/blob/rel/1.1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L121]I
 believe apache master branch also has the same code.

After failover, the new active master finds the right fs ( "hdfs://DEV-CLUSTER" 
), but the new standby (previous active) master will join the rest and have it 
set to "hdfs://DEV-CLUSTER/hbase-root"

I would like to provide a patch if this is indeed an unexpected behavior, but 
could anyone please help me identify some cases where  `fs.defaultFS` from 
standby masters might be used?
Thanks

> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, however I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick look at the code, the cause seems to be that standby masters got 
> the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-11 Thread Lei Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-21706:
-
Description: 
I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
 Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
Please correct me if this is not a bug but a feature, however I find this 
behavior surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].

  was:
I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
 Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
Please correct me if this is not a bug but a feature, but I find this behavior 
surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].


> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, however I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick looking at the code, the cause seems to be that standby masters 
> got the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-11 Thread Lei Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-21706:
-
Description: 
I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
 Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
Please correct me if this is not a bug but a feature, however I find this 
behavior surprising plus I cannot locate any related document.

>From a quick look at the code, the cause seems to be that standby masters got 
>the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].

  was:
I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
 Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
Please correct me if this is not a bug but a feature, however I find this 
behavior surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].


> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, however I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick look at the code, the cause seems to be that standby masters got 
> the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-11 Thread Lei Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-21706:
-
Description: 
I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
 Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
Please correct me if this is not a bug but a feature, but I find this behavior 
surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].

  was:
I'm using HDP-2.3.6.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}

Please correct me if this is not a bug but a feature, but I find this behavior 
surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].


> Inconsistency of fs.defaultFS between active and standby masters
> 
>
> Key: HBASE-21706
> URL: https://issues.apache.org/jira/browse/HBASE-21706
> Project: HBase
>  Issue Type: Bug
>  Components: conf, master
>Affects Versions: 1.1.2
>Reporter: Lei Chen
>Priority: Minor
>
> I'm using HDP-2.6.3.22-1 with HBase HA configured. I noticed that active and 
> standby masters have different `fs.defaultFS` on their /conf pages.
>  Given `fs.defaultFS` is set to : and `hbase.rootdir` is 
> set to :/ in core-site.xml on all the hosts, it 
> looks like standby masters has `fs.defaultFS` programatically set to the same 
> value as `hbase.rootdir`. 
> For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
> following line on the /conf page
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTERprogramatically
> {code}
> but standby masters has 
> {code:java}
> fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}
> Please correct me if this is not a bug but a feature, but I find this 
> behavior surprising plus I cannot locate any related document.
> From a quick looking at the code, the cause seems to be that standby masters 
> got the property set in 
> [HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
>  and active master got it set in a different way in 
> [MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21706) Inconsistency of fs.defaultFS between active and standby masters

2019-01-11 Thread Lei Chen (JIRA)
Lei Chen created HBASE-21706:


 Summary: Inconsistency of fs.defaultFS between active and standby 
masters
 Key: HBASE-21706
 URL: https://issues.apache.org/jira/browse/HBASE-21706
 Project: HBase
  Issue Type: Bug
  Components: conf, master
Affects Versions: 1.1.2
Reporter: Lei Chen


I'm using HDP-2.3.6.22-1 with HBase HA configured. I noticed that active and 
standby masters have different `fs.defaultFS` on their /conf pages.
Given `fs.defaultFS` is set to : and `hbase.rootdir` is set 
to :/ in core-site.xml on all the hosts, it looks 
like standby masters has `fs.defaultFS` programatically set to the same value 
as `hbase.rootdir`. 

For example, on a 3 heads cluster DEV-CLUSTER, my active master has the 
following line on the /conf page
{code:java}
fs.defaultFShdfs://DEV-CLUSTERprogramatically
{code}
but standby masters has 
{code:java}
fs.defaultFShdfs://DEV-CLUSTER/hbase-rootprogramatically{code}

Please correct me if this is not a bug but a feature, but I find this behavior 
surprising plus I cannot locate any related document.

>From a quick looking at the code, the cause seems to be that standby masters 
>got the property set in 
>[HRegionServer.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java#L649-L652],
> and active master got it set in a different way in 
>[MasterFileSystem.java|https://github.com/hortonworks/hbase-release/blob/HDP-2.6.3.22-tag/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java#L132-L137].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows

2018-10-05 Thread Lei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639905#comment-16639905
 ] 

Lei Chen commented on HBASE-16423:
--

I'm facing the false positive inconsistency problem you described here.
Having the thread sleep and compare again some time later looks like a good way 
to reduce noises, but may not be a guaranteed way to report inconsistency. As 
long as the ingestion is running, it is possible at the time of re-comparing, 
the target row of source and replication have matched and diverged again. A 
more sophisticated method may be required if user needs 100% confidence.

> Add re-compare option to VerifyReplication to avoid occasional inconsistent 
> rows
> 
>
> Key: HBASE-16423
> URL: https://issues.apache.org/jira/browse/HBASE-16423
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>Assignee: Jianwei Cui
>Priority: Minor
> Fix For: 1.4.0, 2.0.0
>
> Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, 
> HBASE-16423-v2.patch, HBASE-16423-v3.patch
>
>
> Because replication keeps eventually consistency, VerifyReplication may 
> report inconsistent rows if there are data being written to source or peer 
> clusters during scanning. These occasionally inconsistent rows will have the 
> same data if we do the comparison again after a short period. It is not easy 
> to find the really inconsistent rows if VerifyReplication report a large 
> number of such occasionally inconsistency. To avoid this case, we can add an 
> option to make VerifyReplication read out the inconsistent rows again after 
> sleeping a few seconds and re-compare the rows during scanning. This behavior 
> follows the eventually consistency of hbase's replication. Suggestions and 
> discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-16423) Add re-compare option to VerifyReplication to avoid occasional inconsistent rows

2018-10-05 Thread Lei Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-16423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16639905#comment-16639905
 ] 

Lei Chen edited comment on HBASE-16423 at 10/5/18 2:50 PM:
---

I'm facing the false positive inconsistency problem you described here as well.
 Having the thread sleep and compare again some time later looks like a good 
way to reduce noises, but may not be a guaranteed way to report inconsistency. 
As long as the ingestion is running, it is possible at the time of 
re-comparing, the target row of source and replication have matched and 
diverged again. A more sophisticated method may be required if user needs 100% 
confidence.


was (Author: leochen4891):
I'm facing the false positive inconsistency problem you described here.
Having the thread sleep and compare again some time later looks like a good way 
to reduce noises, but may not be a guaranteed way to report inconsistency. As 
long as the ingestion is running, it is possible at the time of re-comparing, 
the target row of source and replication have matched and diverged again. A 
more sophisticated method may be required if user needs 100% confidence.

> Add re-compare option to VerifyReplication to avoid occasional inconsistent 
> rows
> 
>
> Key: HBASE-16423
> URL: https://issues.apache.org/jira/browse/HBASE-16423
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>Assignee: Jianwei Cui
>Priority: Minor
> Fix For: 1.4.0, 2.0.0
>
> Attachments: HBASE-16423-branch-1-v1.patch, HBASE-16423-v1.patch, 
> HBASE-16423-v2.patch, HBASE-16423-v3.patch
>
>
> Because replication keeps eventually consistency, VerifyReplication may 
> report inconsistent rows if there are data being written to source or peer 
> clusters during scanning. These occasionally inconsistent rows will have the 
> same data if we do the comparison again after a short period. It is not easy 
> to find the really inconsistent rows if VerifyReplication report a large 
> number of such occasionally inconsistency. To avoid this case, we can add an 
> option to make VerifyReplication read out the inconsistent rows again after 
> sleeping a few seconds and re-compare the rows during scanning. This behavior 
> follows the eventually consistency of hbase's replication. Suggestions and 
> discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-11 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621
 ] 

Lei Chen edited comment on HBASE-18005 at 5/11/17 5:18 PM:
---

Thanks for the explanation and update.
Yes, there is a gap between the primary meta region and its replica, defined by 
hbase.regionserver.meta.storefile.refresh.period, plus there is no notification 
mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed 
not a complete solution but an improve.
Meanwhile, there is also a gap between the primary meta region and the one 
cached on the client side.
The difference between the two gaps is how the gap is closed. The first one 
refreshes with a fixed interval while the second one updates when a miss is 
encountered.
Please correct me if I'm wrong, the worst case I can imagine is 
1. The locations of a primary region p1 and its replica r1 have changed. 
2. The primary meta updates but its replica is not, due to the fixed interval
3. The region server that serves both primaries  goes down
4. A client has not updated its meta cache after p1 and r1 was relocated, and 
now makes a get request to p1
In this case, neither the client cache nor the meta replica can provide the 
correct location of the target regions.

That being said, I agree with you that the cached location of the replicas is 
still worth trying, and should be pardoned from clearing the meta cache, as you 
have proposed in the patch.


was (Author: leochen4891):
Thanks for the explanation and update.
Yes, there is a gap between the primary meta region and its replica, defined by 
hbase.regionserver.meta.storefile.refresh.period, plus there is no notification 
mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed 
not a complete solution but an improve.
Meanwhile, there is also a gap between the primary meta region and the one 
cached on the client side.
The difference between the two gaps is how the gap is closed. The first one 
refreshes with a fixed interval while the second one updates when a miss is 
encountered.
Please correct me if I'm wrong, the worst case I can imagine is 
1. The locations of a primary region p1 and its replica r1 have changed. 
2. The primary meta updates but its replica is not, due to the fixed interval
3. The region server that serves both primaries  goes down
4. A client has not updated its meta cache after p1 and r1 was relocated, and 
now makes a get request to p1

That being said, I agree with you that the cached location of the replicas is 
still worth trying, and should be pardoned from clearing the meta cache, as you 
have proposed in the patch.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-11 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621
 ] 

Lei Chen edited comment on HBASE-18005 at 5/11/17 3:18 PM:
---

Thanks for the explanation and update.
Yes, there is a gap between the primary meta region and its replica, defined by 
hbase.regionserver.meta.storefile.refresh.period, plus there is no notification 
mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed 
not a complete solution but an improve.
Meanwhile, there is also a gap between the primary meta region and the one 
cached on the client side.
The difference between the two gaps is how the gap is closed. The first one 
refreshes with a fixed interval while the second one updates when a miss is 
encountered.
Please correct me if I'm wrong, the worst case I can imagine is 
1. The locations of a primary region p1 and its replica r1 have changed. 
2. The primary meta updates but its replica is not, due to the fixed interval
3. The region server that serves both primaries  goes down
4. A client has not updated its meta cache after p1 and r1 was relocated, and 
now makes a get request to p1

That being said, I agree with you that the cached location of the replicas is 
still worth trying, and should be pardoned from clearing the meta cache, as you 
have proposed in the patch.


was (Author: leochen4891):
Thanks for the explanation and update.
Yes, there is a gap between the primary meta region and its replica, defined by 
hbase.regionserver.meta.storefile.refresh.period, plus there is no notification 
mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed 
not a complete solution but an improve.
Meanwhile, there is also a gap between the primary meta region and the one 
cached on the client side.
The difference between the two gaps is how the gap is closed. The first one 
refreshes with a fixed interval while the second one updates see a miss.
Please correct me if I'm wrong, the worst case I can imagine is 
1. The locations of a primary region p1 and its replica r1 have changed. 
2. The primary meta updates but its replica is not, due to the fixed interval
3. The region server that serves both primaries  goes down
4. A client has not updated its meta cache after p1 and r1 was relocated, and 
now makes a get request to p1

That being said, I agree with you that the cached location of the replicas is 
still worth trying, and should be pardoned from clearing the meta cache, as you 
have proposed in the patch.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-11 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006621#comment-16006621
 ] 

Lei Chen commented on HBASE-18005:
--

Thanks for the explanation and update.
Yes, there is a gap between the primary meta region and its replica, defined by 
hbase.regionserver.meta.storefile.refresh.period, plus there is no notification 
mechanism at present, setting the hbase.meta.replica.count to 2 or 3 is indeed 
not a complete solution but an improve.
Meanwhile, there is also a gap between the primary meta region and the one 
cached on the client side.
The difference between the two gaps is how the gap is closed. The first one 
refreshes with a fixed interval while the second one updates see a miss.
Please correct me if I'm wrong, the worst case I can imagine is 
1. The locations of a primary region p1 and its replica r1 have changed. 
2. The primary meta updates but its replica is not, due to the fixed interval
3. The region server that serves both primaries  goes down
4. A client has not updated its meta cache after p1 and r1 was relocated, and 
now makes a get request to p1

That being said, I agree with you that the cached location of the replicas is 
still worth trying, and should be pardoned from clearing the meta cache, as you 
have proposed in the patch.

> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-18005) read replica: handle the case that region server hosting both primary replica and meta region is down

2017-05-10 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16005805#comment-16005805
 ] 

Lei Chen commented on HBASE-18005:
--

Hi Huaxiang,
I'm running into this issue as well. Good work!

May I ask which version of hbase are you using? 
The reason I have the question is because I'm using HBase 1.1.2 which doesn't 
have the fix for an known meta table replication issue 
(https://issues.apache.org/jira/browse/HBASE-17238)
With HBase-17238 in place, setting hbase.meta.replica.count to a number greater 
than 1 should be able to handle the case where the primary regions of a normal 
table and the meta table are both down.

I'm curious if you are in the same situation as me that cannot have 
hbase.meta.replica.count set to, say 3? 
(https://hbase.apache.org/book.html#_server_side_properties)


> read replica: handle the case that region server hosting both primary replica 
> and meta region is down
> -
>
> Key: HBASE-18005
> URL: https://issues.apache.org/jira/browse/HBASE-18005
> Project: HBase
>  Issue Type: Bug
>Reporter: huaxiang sun
>Assignee: huaxiang sun
> Attachments: HBASE-18005-master-001.patch
>
>
> Identified one corner case in testing  that when the region server hosting 
> both primary replica and the meta region is down, the client tries to reload 
> the primary replica location from meta table, it is supposed to clean up only 
> the cached location for specific replicaId, but it clears caches for all 
> replicas. Please see
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L813
> Since it takes some time for regions to be reassigned (including meta 
> region), the following may throw exception
> https://github.com/apache/hbase/blob/master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RpcRetryingCallerWithReadReplicas.java#L173
> This exception needs to be caught and  it needs to get cached location (in 
> this case, the primary replica's location is not available). If there are 
> cached locations for other replicas, it can still go ahead to get stale 
> values from secondary replicas.
> With meta replica, it still helps to not clean up the caches for all replicas 
> as the info from primary meta replica is up-to-date.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-16 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791565#comment-14791565
 ] 

Lei Chen commented on HBASE-14082:
--

Thank you all for helping me all the way.

> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-09-10 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739838#comment-14739838
 ] 

Lei Chen commented on HBASE-14082:
--

thanks for rebasing the patch

> Add replica id to JMX metrics names
> ---
>
> Key: HBASE-14082
> URL: https://issues.apache.org/jira/browse/HBASE-14082
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Lei Chen
>Assignee: Lei Chen
> Fix For: 2.0.0
>
> Attachments: 14082-v6.patch, HBASE-14082-v1.patch, 
> HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, 
> HBASE-14082-v5.patch
>
>
> Today, via JMX, one cannot distinguish a primary region from a replica. A 
> possible solution is to add replica id to JMX metrics names. The benefits may 
> include, for example:
> # Knowing the latency of a read request on a replica region means the first 
> attempt to the primary region has timeout.
> # Write requests on replicas are due to the replication process, while the 
> ones on primary are from clients.
> # In case of looking for hot spots of read operations, replicas should be 
> excluded since TIMELINE reads are sent to all replicas.
> To implement, we can change the format of metrics names found at 
> {code}Hadoop->HBase->RegionServer->Regions->Attributes{code}
> from 
> {code}namespace__table__region__metric_{code}
> to
> {code}namespace__table__region__replicaid__metric_{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict

2015-08-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708504#comment-14708504
 ] 

Lei Chen commented on HBASE-14293:
--

Forgot the link to the function: 
[here|http://svn.apache.org/viewvc/camel/trunk/components/camel-test/src/main/java/org/apache/camel/test/AvailablePortFinder.java?view=markup#l130]

 TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
 --

 Key: HBASE-14293
 URL: https://issues.apache.org/jira/browse/HBASE-14293
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.0.0

 Attachments: 14293-v1.txt, 14293-v2.txt


 From 
 https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/
  :
 {code}
 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] 
 coprocessor.CoprocessorHost(518): The coprocessor 
 org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: 
 Port already in use: 61120; nested exception is: 
   java.net.BindException: Address already in use
 java.rmi.server.ExportException: Port already in use: 61120; nested exception 
 is: 
   java.net.BindException: Address already in use
   at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329)
   at 
 sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237)
   at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
 ...
 2015-08-22 20:49:41,755 DEBUG [main] 
 hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when 
 starting cluster. Trying port 61122
 java.lang.IllegalStateException: A mini-cluster is already running
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810)
   at 
 org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89)
 {code}
 When port conflict is detected, we try the next port.
 However, HTU#miniClusterRunning is true by this moment, leading to the second 
 exception shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict

2015-08-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708502#comment-14708502
 ] 

Lei Chen commented on HBASE-14293:
--

For patch v1, it looks good.
I'm also thinking if shutting down and restart miniCluster is somehow costly, 
it might be better if we can pick an available port before starting the 
miniCluster? Using a function like this
Maybe randomly choose a port, and test it. If it is available, start the 
miniCluster, otherwise try 10 more other random ports.
What do you think?

 TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
 --

 Key: HBASE-14293
 URL: https://issues.apache.org/jira/browse/HBASE-14293
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.0.0

 Attachments: 14293-v1.txt, 14293-v2.txt


 From 
 https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/
  :
 {code}
 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] 
 coprocessor.CoprocessorHost(518): The coprocessor 
 org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: 
 Port already in use: 61120; nested exception is: 
   java.net.BindException: Address already in use
 java.rmi.server.ExportException: Port already in use: 61120; nested exception 
 is: 
   java.net.BindException: Address already in use
   at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329)
   at 
 sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237)
   at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
 ...
 2015-08-22 20:49:41,755 DEBUG [main] 
 hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when 
 starting cluster. Trying port 61122
 java.lang.IllegalStateException: A mini-cluster is already running
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810)
   at 
 org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89)
 {code}
 When port conflict is detected, we try the next port.
 However, HTU#miniClusterRunning is true by this moment, leading to the second 
 exception shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict

2015-08-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708520#comment-14708520
 ] 

Lei Chen commented on HBASE-14293:
--

For patch v2:
1. It seems the do-while loop will infinitely try to obtain an available port. 
If there exists some extreme condition (e.g. network interface problems?) will 
the loop stuck or JUnit will trigger a timeout?
2. nit, if the for loop is to try different ports, after using the port-finding 
function, is it still needed?

 TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
 --

 Key: HBASE-14293
 URL: https://issues.apache.org/jira/browse/HBASE-14293
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.0.0

 Attachments: 14293-v1.txt, 14293-v2.txt


 From 
 https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/
  :
 {code}
 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] 
 coprocessor.CoprocessorHost(518): The coprocessor 
 org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: 
 Port already in use: 61120; nested exception is: 
   java.net.BindException: Address already in use
 java.rmi.server.ExportException: Port already in use: 61120; nested exception 
 is: 
   java.net.BindException: Address already in use
   at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329)
   at 
 sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237)
   at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
 ...
 2015-08-22 20:49:41,755 DEBUG [main] 
 hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when 
 starting cluster. Trying port 61122
 java.lang.IllegalStateException: A mini-cluster is already running
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810)
   at 
 org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89)
 {code}
 When port conflict is detected, we try the next port.
 However, HTU#miniClusterRunning is true by this moment, leading to the second 
 exception shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14293) TestStochasticBalancerJmxMetrics intermittently fails due to port conflict

2015-08-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708706#comment-14708706
 ] 

Lei Chen commented on HBASE-14293:
--

patch v3 looks good. +1

 TestStochasticBalancerJmxMetrics intermittently fails due to port conflict
 --

 Key: HBASE-14293
 URL: https://issues.apache.org/jira/browse/HBASE-14293
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 2.0.0

 Attachments: 14293-v1.txt, 14293-v2.txt, 14293-v3.txt


 From 
 https://builds.apache.org/job/HBase-TRUNK/6748/testReport/junit/org.apache.hadoop.hbase/TestStochasticBalancerJmxMetrics/testJmxMetrics_EnsembleMode/
  :
 {code}
 2015-08-22 20:46:07,939 ERROR [M:0;asf900:59022] 
 coprocessor.CoprocessorHost(518): The coprocessor 
 org.apache.hadoop.hbase.JMXListener threw java.rmi.server.ExportException: 
 Port already in use: 61120; nested exception is: 
   java.net.BindException: Address already in use
 java.rmi.server.ExportException: Port already in use: 61120; nested exception 
 is: 
   java.net.BindException: Address already in use
   at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329)
   at 
 sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:237)
   at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411)
 ...
 2015-08-22 20:49:41,755 DEBUG [main] 
 hbase.TestStochasticBalancerJmxMetrics(93): Encountered exception when 
 starting cluster. Trying port 61122
 java.lang.IllegalStateException: A mini-cluster is already running
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:981)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:872)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:866)
   at 
 org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:810)
   at 
 org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics.setupBeforeClass(TestStochasticBalancerJmxMetrics.java:89)
 {code}
 When port conflict is detected, we try the next port.
 However, HTU#miniClusterRunning is true by this moment, leading to the second 
 exception shown above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-08-15 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Attachment: HBASE-14082-v5.patch

Updates:
1. Javadoc fix

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, 
 HBASE-14082-v3.patch, HBASE-14082-v4.patch, HBASE-14082-v5.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-08-15 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Attachment: HBASE-14082-v4.patch

Updates:
1. Javadoc and wrapping long lines


 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, 
 HBASE-14082-v3.patch, HBASE-14082-v4.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-08-14 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Attachment: HBASE-14082-v3.patch

Updates:
1. moved replica id from metrics name to a separate metric.
2. updated related test

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, 
 HBASE-14082-v3.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-08-14 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Status: Patch Available  (was: Open)

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, 
 HBASE-14082-v3.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-08-12 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14694425#comment-14694425
 ] 

Lei Chen commented on HBASE-14082:
--

Thanks for the suggestion, I will upload a patch for the 1.x soon

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-05 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-branch-1.patch

Updates:
1. Included 13965-addendum which tries 5 different ports for JMX connection
2. Fix balancer already exists error in TestAssignmentManager.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-branch-1.patch, 
 HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-05 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-branch-1-v2.patch

Updates:
1. wrapped a long line ( 100)

The failed test from last patch seems not related. Here is the log:

testWalRollOnLowReplication(org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS)
  Time elapsed: 3.804 sec   ERROR!
java.lang.RuntimeException: sync aborted
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:491)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.insert(WALProcedureStore.java:334)
at 
org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication(TestWALProcedureStoreOnHDFS.java:189)
Caused by: org.apache.hadoop.ipc.RemoteException: File 
/test-logs/state-0006.log could only be replicated to 2 nodes 
instead of minReplication (=3).  There are 3 datanode(s) running and 3 node(s) 
are excluded in this operation.
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526)

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-branch-1-v2.patch, 
 HBASE-13965-branch-1.patch, HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute 

[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v10.patch

Updates:
1. Spelling and formatting
2. LOG level changed to error when failed to get size of all tables.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Status: Patch Available  (was: Open)

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652765#comment-14652765
 ] 

Lei Chen commented on HBASE-13965:
--

+1

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652247#comment-14652247
 ] 

Lei Chen commented on HBASE-14082:
--

Would it be simpler if we put the replica_id also in the Regions instead of 
creating a new MBean? 
The replica id can be queried using wildcard matching, without the need of 
searching in the name to replica_id map.

e.g.
{code}
Regions: {
namespace_default_table_foo_region_aaabbb_metric_mutateCount: 100,
namespace_default_table_foo_region_aaabbb_metric_replicaid: 0,
namespace_default_table_foo_region_bbbccc_metric_mutateCount: 100,
namespace_default_table_foo_region_bbbccc_metric_replicaid: 1,
}
{code}

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652314#comment-14652314
 ] 

Lei Chen commented on HBASE-13965:
--

thanks, I will update soon

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v11.patch

Updates:
1. License added for 
{{hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource}}


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-07-28 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645177#comment-14645177
 ] 

Lei Chen commented on HBASE-14082:
--

I noticed that there is a function called isDefaultReplica(int replicaid) in 
RegionReplicaUtil.java
Does default replica means primary region, and I should use this function 
instead of checking replicaid  0 ?

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-07-28 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Attachment: HBASE-14082-v2.patch

Updates:
1. When replica id  0, use the following format
{code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}
2. Test case updated.



 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-27 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v9.patch

Updates:
1. added a null pointer check in getMetrics( )


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-26 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14642069#comment-14642069
 ] 

Lei Chen commented on HBASE-13965:
--

I will attach a summary and a patch soon. Thanks for reminding.




 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-26 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBase-13965-JConsole.png

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-07-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639875#comment-14639875
 ] 

Lei Chen commented on HBASE-14082:
--

If an existing parser uses exact-string matching or order-sensitive key-value 
matching, it may not be backwards compatible. 

For example, If we have 2 regions, one primary and one secondary
{{namespace_default_table_sales_region__metric_storeCount}}
{{namespace_default_table_sales_region__metric_storeCount}}
becomes
{{namespace_default_table_sales_region__metric_storeCount}}
{{namespace_default_table_sales_region__replicaid_1_metric_storeCount}}

Since I'm not familiar with any existing parser, please correct me if I made 
any false assumption.

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-23 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639877#comment-14639877
 ] 

Lei Chen commented on HBASE-13965:
--

I've been trying to apply the patch on a test hbase cluster for verifying.
Currently experiencing some compatibility issues. 
Will update status once got results.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-07-17 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632087#comment-14632087
 ] 

Lei Chen commented on HBASE-14082:
--

Changing the metrics names has compatibility issue.
Can we add one metric for each region and has its replica id as the value?
For example:
{{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_storeCount}}
{{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_storeFileCount}}
{{namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_memStoreSize}}
{{{color:red}namespace_default_table_sales_region_00254ff636363bd577b9a66de7cc4bbf_metric_replicaid{color}}}


 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-07-15 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Attachment: HBASE-14082-v1.patch

Updates:
1. Added getReplicaId() to MetricsRegionWrapper
2. Inserted replicaid and value pair to MetricsRegionSourceImpl
3. Updated test case TestMetricsRegion

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14082) Add replica id to JMX metrics names

2015-07-15 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-14082:
-
Status: Patch Available  (was: Open)

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-07-15 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628738#comment-14628738
 ] 

Lei Chen commented on HBASE-14082:
--

Thanks for pointing out the compatibility issue. I will propose a change 
shortly.

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-14 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v8.patch

Updates:
1. Use the number of all tables (including system tables) to calculate the size 
of the MRU map. This should be fine since we are trying to avoid OOM, not 
necessarily calculate the exact number of metrics needed.
2. formatting and spelling improvements

TODO:
1. The unit test uses 61120 as the JMX registry port. I noticed that in one of 
the recent QA test results, it reports a Port already in use error. Should I 
change the port?
2. The last two patches failed the core tests. However I'm not sure that the 
failed test, TestWALProcedureStoreOnHDFS.testWalRollOnLowReplication, is 
related to this patch.
3. About removing the per-table mode entirely, I'm not sure it should be 
included in this JIRA.


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14082) Add replica id to JMX metrics names

2015-07-14 Thread Lei Chen (JIRA)
Lei Chen created HBASE-14082:


 Summary: Add replica id to JMX metrics names
 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen


Today, via JMX, one cannot distinguish a primary region from a replica. A 
possible solution is to add replica id to JMX metrics names. The benefits may 
include, for example:
# Knowing the latency of a read request on a replica region means the first 
attempt to the primary region has timeout.
# Write requests on replicas are due to the replication process, while the ones 
on primary are from clients.
# In case of looking for hot spots of read operations, replicas should be 
excluded since TIMELINE reads are sent to all replicas.

To implement, we can change the format of metrics names found at 
{code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
from 
{code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
to
{code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-07-14 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627396#comment-14627396
 ] 

Lei Chen commented on HBASE-14082:
--

If the changes are
# adding a getter, getReplicaId(), to MetricsRegionWrapper.java
# inserting a string, _replicaid_ + regionWrapper.getReplicaId(), to 
MetricsRegionSourceImpl.java

Should I include a test case? 
I'm not sure if it is preferred that every change should be covered by unit 
tests.

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen

 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-13 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624668#comment-14624668
 ] 

Lei Chen commented on HBASE-13965:
--

Do you mean removing all the code for the per-table balancing, as well as 
documents if any?
If the balancing is always performed on the ensemble table, then no table name 
is needed for the metrics name in JMX, right?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-12 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14624048#comment-14624048
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for pointing it out.

I had some test and found that there are two system tables, hbase:meta and 
hbase:namespace.
The first one, hbase:meta, will not be given to balancer while hbase:namespace 
will.

Maybe I can just get the number of all tables, and -1 for removing hbase:meta? 
That will avoid the looping.
{code}tablesCount = services.getTableDescriptors().getAll().size();
// -1 for removing a system table, hbase:meta, which will not be balanced.
tablesCount--;
{code}

However, this is based on an assumption that only one non-balancing system 
table exists. Is there any better solution?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-12 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623933#comment-14623933
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for the feedback.

{Should we do early return if the table is system table ?}
Should filtering tables happen before/outside updateStochasticCosts? (e.g in 
HMaster?) 
Since HMaster sends hbase:namespace for balancing, there might be a reason for 
that, and I feel it may help to include its costs in JMX, just in case.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-10 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622705#comment-14622705
 ] 

Lei Chen commented on HBASE-13965:
--

Ted, Clay, and stack, thanks for offering suggestions. 

Currently, in the ensemble mode, the combined table is named ensemble, I 
somehow feel that it might be a good idea to use the name directly. For 
example, ensemble_Overall or ensemble_costFunction1. 

In a similar way, when in the per-table mode, the system table hbase:namespace 
can be reported to JMX using it's name directly as well. The benefit is that we 
can avoid any special case handling in the balancer. The balancer just 
calculate and report whatever tables given by hmaster. 

I'm updating the unit test to cover both modes. 


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-10 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622782#comment-14622782
 ] 

Lei Chen commented on HBASE-13965:
--

I think the ensemble table is more like a temporary table, and it is not in any 
namespace.
https://github.com/apache/hbase/blob/0.98/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L792

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-10 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v7.patch

Updates:
1. Overloaded balanceCluster() to pass the table name to balancer
2. Moved some string constants to HConstants.java
3. Stochastic balancer auto adjust JMX metrics size by the number of tables
4. Stochastic handles both ensemble and per-table modes.
5. Updated tests to cover both modes.

TODO:
1. The tests currently only use the miniCluster to save and read JMX metrics, 
which means that the tables are not actually stored in hbase. I'm NOT sure if 
this method is adequate or we need to save real tables to miniCluster and 
balance them for real?

Sorry guys, I still cannot upload the patch file to review board. The diff file 
always gets No valid separator after the filename was found in the diff 
header error. If I manually touch up the file by adding (revision ) or 
(working copy), I got revision  cannot be found error. The command line 
rbt has the same problem.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-10 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622900#comment-14622900
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for clarifying. 
I will use hbase:ensemble for this cute table-like thing - smile.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-09 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621641#comment-14621641
 ] 

Lei Chen commented on HBASE-13965:
--

I also found that when {{hbase.master.loadbalancer.bytable}} is set to true, 
balancing will also be performed on table hbase:namespace, which is a system 
table. Should the costs of hbase:namespace be reported to JMX the same way as 
user tables? 
Any idea?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-09 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621457#comment-14621457
 ] 

Lei Chen commented on HBASE-13965:
--

I have found a problem related to HBASE-5231(per-table load balancing).
It seems that the balancing is done by an iteration of tables. 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1219-L1228

This can be configured to be in per-table mode or ensemble mode. 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java#L956-L962

In ensemble mode, all the tables are copied into an ensemble table for 
balancing. The configuration is set by
 {{hbase.master.loadbalancer.bytable}}

My question is how to name the metrics if the balancing is in ensemble mode.

For example, suppose we have two tables, Table1 and Table2, and N cost 
functions.
In the per-table mode, each table will have an overall cost and one for each 
cost function.
{{Table1_Overall}}
{{Table1_costFunction}} x N
{{Table2_Overall}}
{{Table2_costFunction}} x N
In the ensemble mode, there will be only one overall and one set of function 
costs.
{{ensemble_Overall}}
{{ensemble_costFunction}} x N

Can we use a special name for the combined table, e.g. ensemble? The problem 
is that the user may have already created a table named ensemble, which may 
cause confusion. 

Any idea on this problem?


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-08 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v6.patch

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-08 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: (was: HBASE-13965-v6.patch)

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-08 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618997#comment-14618997
 ] 

Lei Chen commented on HBASE-13965:
--

Yes, sounds good, since the full list of cost functions should be known to the 
user.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-08 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619554#comment-14619554
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks [~clayb] for giving suggestion.
I have found that the stochastic load balancer holds a reference to HMaster, 
which can be used to get the number of tables, therefore the size of the map 
can be determined. No need to use configurable value. I will update the patch 
soon.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-07 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617922#comment-14617922
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for testing the patch and posting the result metrics.
I agree that using percentage is easier for quick look. 
I will update the patch.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-07 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v6.patch

I'm having difficulties creating a request on reviewboard. When I'm uploading a 
patch file generated by git diff --no-prefix master, always get No valid 
separator after the filename was found in the diff header error. Working on it.
Temporarily still uploading patch file. 

Updates: (trivial changes from v5 to v6)
1. rename some variables with more accurate names
2. use percent for each cost function

TODO:
1. Make hard-coded map size configurable?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615518#comment-14615518
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for your review and great feedback. I will update an updated patch.


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v4.patch

Updates:
1. report - reports
2. costFunctionDesc added to JMX
3. Unnecessary table name length check is removed. 
4. lastSubcosts - lastSubCosts
5. total += this.lastSubCosts[i];

TODO:
1. Make hard-coded map size configurable?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615909#comment-14615909
 ] 

Lei Chen commented on HBASE-13965:
--

Good point. It can be more memory efficient if description is stored only once 
for each cost function. Patch will be updated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v5.patch

Updates:
1. One copy of description is saved for each cost function, in a separate map

TODO:
1. Make hard-coded map size configurable?

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-06 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v3.patch

Update:
1. The max size of the ever growing map is limited to 1000 (hard-coded) using a 
Most-Recent-Used (MRU) cache. 
2. Checkstyle warnings fixed.

TODO:
1. Make the hard-coded map size configurable?



 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v3.patch, HBASE-13965_v2.patch, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-01 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610601#comment-14610601
 ] 

Lei Chen commented on HBASE-13965:
--

I agree that the unused balancers should be purged or made into attributes of 
the stochastic load balancer. I think it may be better to do it in another 
Jira, since “One thing at a time”.

About the ever growing map, I’m thinking of two ways to solve this problem. 
1.  Besides updateStochasticCost, add another method (or add a boolean 
parameter) which should be called when the table is deleted. This will allow 
the map to contain only existing tables
2.  Use a fixed-size most recent used (MRU) cache to store the map. The 
size can be configurable. 

Any suggestion?


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-07-01 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965_v2.patch

Changes:
1. License added for new classes.
2. Javadoc updated.
3. Several commits squashed into one.
4. Use  != null, not null != 

TODO:
1. The ever growing map in MetricsStochasticBalancerSourceImpl.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965_v2.patch, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-30 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609412#comment-14609412
 ] 

Lei Chen commented on HBASE-13965:
--

Thanks for the feedback. I appreciate your detailed review and help. I will 
modify the patch.  

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-30 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBase-13965-v1.patch

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-29 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: stochasticloadbalancerclasses_v2.png

Before and after the patch. Other balancers will works the same way as before.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-29 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606864#comment-14606864
 ] 

Lei Chen commented on HBASE-13965:
--

Sorry for the delay. I propose the following changes which can be seen 
illustrated in the attached class diagram.

The {{StochasticLoadBalancer}} extends {{BaseLoadBalancer}} which has a class 
variable of type {{MetricsBalancer}}. The {{MetricsBalancer}} contains a 
private class variable of type {{MetricsBalancerSource}}. The 
{{MetricsBalancerSource}} is an interface which defines which metrics can be 
reported to JMX. This proves to make extension difficult for load balancer 
implementation specific metrics (e.g. the {{StochasticLoadBalancer}}).

Adding metrics to the generic interface is not appropriate being it is used by 
all load balancers and should not contain any load balancer specific metrics. I 
propose to create a class extending {{MetricsBalancer}} to provide specific 
load balancer metrics. To use this class, I propose to add a constructor to 
{{BaseLoadBalancer}} which allows for the balancer instance metrics class to be 
passed in. (Thanks [~enis] for code review and giving the constructor 
suggestion!)

In the constructor of {{StochasticLoadBalancer}}, an instance of 
{{MetricsStochasticBalancer}} is created and passed to a new constructor added 
to {{BaseLoadBalancer}}, which will use it to replace the default 
{{MetricsBalancer}}. The function used to add metrics is declared as following:
{code}
public void updateStochasticCost(String tableName, String costFunctionName, 
String costFunctionDesc, Double value);
{code}

In {{MetricsBalancer}}, the {{private final}} class variable {{source}} was 
previously hardcoded and instantiated in its constructor; I propose a new 
function {{initSource}} which can be overridden to set this variable. As such, 
in the subclass {{MetricsStochasticBalancer}}, {{initSource}} will create a 
{{MetricsStochasticBalancerSource}} instance instead of the default 
{{MetricsBalancerSource}}.

Finally, to give good insight to the internal status of 
{{StochasticLoadBalancer}}, we are considering adding metrics for each cost 
function, as well as the overall cost. For example, if the balancing is carried 
out for table MyTable1 and the {{StochasticLoadBalancer}} has 3 cost 
functions MoveCost, LocalityCost, and RegionReplicaHostCost, then 4 
metrics will be added to “HBase - Master - Balancer” as following:
MyTable1_Overall
MyTable1_MoveCost
MyTable1_LocalityCost
MyTable1_RegionReplicaHostCost

I'm building the patch, any suggestion is appreciated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen

 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-24 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen reassigned HBASE-13965:


Assignee: Lei Chen

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen

 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-06-24 Thread Lei Chen (JIRA)
Lei Chen created HBASE-13965:


 Summary: Stochastic Load Balancer JMX Metrics
 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen


Today’s default HBase load balancer (the Stochastic load balancer) is cost 
function based. The cost function weights are tunable but no visibility into 
those cost function results is directly provided.

A driving example is a cluster we have been tuning which has skewed rack size 
(one rack has half the nodes of the other few racks). We are tuning the cluster 
for uniform response time from all region servers with the ability to tolerate 
a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
RegionCountSkew Cost is difficult without a way to attribute each cost 
function’s contribution to overall cost. 
What this jira proposes is to provide visibility via JMX into each cost 
function of the stochastic load balancer, as well as the overall cost of the 
balancing plan.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)