[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
 Hadoop Flags: Reviewed
Fix Version/s: 1.3.0
   2.0.0

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v10.patch

Updates:
1. Spelling and formatting
2. LOG level changed to error when failed to get size of all tables.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Status: Patch Available  (was: Open)

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651944#comment-14651944
 ] 

Hadoop QA commented on HBASE-14178:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12748436/HBASE-14178_v4.patch
  against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133.
  ATTACHMENT ID: 12748436

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 2 zombie test(s):   
at 
org.apache.hadoop.mapred.TestMRIntermediateDataEncryption.testMultipleReducers(TestMRIntermediateDataEncryption.java:70)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14962//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14962//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14962//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14962//console

This message is automatically generated.

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 

[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652037#comment-14652037
 ] 

Ted Yu commented on HBASE-13965:


Planning to commit once QA run passes.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652053#comment-14652053
 ] 

Heng Chen commented on HBASE-14178:
---

{quote}
You can see the call to 
cacheConf.shouldCacheBlockOnRead(expectedBlockType.getCategory()) checks wrt 
whether the read request says the block to be cached after this read. It is not 
telling abt the CF level setting of whether data to be cached at all or not.
{quote}

cacheDataOnRead  represents for CF Level cache setting. You can see 
cacheDataOnRead is initialized in CacheConfig constructor 
{code}
  CacheConfig(final BlockCache blockCache,
  final boolean cacheDataOnRead, final boolean inMemory,
  final boolean cacheDataOnWrite, final boolean cacheIndexesOnWrite,
  final boolean cacheBloomsOnWrite, final boolean evictOnClose,
  final boolean cacheDataCompressed, final boolean prefetchOnOpen,
  final boolean cacheDataInL1) {
this.blockCache = blockCache;
this.cacheDataOnRead = cacheDataOnRead;
this.inMemory = inMemory;
this.cacheDataOnWrite = cacheDataOnWrite;
this.cacheIndexesOnWrite = cacheIndexesOnWrite;
this.cacheBloomsOnWrite = cacheBloomsOnWrite;
this.evictOnClose = evictOnClose;
this.cacheDataCompressed = cacheDataCompressed;
this.prefetchOnOpen = prefetchOnOpen;
this.cacheDataInL1 = cacheDataInL1;
LOG.info(this);
  }
{code}

And this constructor is called by another constructor 
{code}
  public CacheConfig(Configuration conf, HColumnDescriptor family) {
this(CacheConfig.instantiateBlockCache(conf),
family.isBlockCacheEnabled(),
family.isInMemory(),
// For the following flags we enable them regardless of per-schema 
settings
// if they are enabled in the global configuration.
conf.getBoolean(CACHE_BLOCKS_ON_WRITE_KEY,
DEFAULT_CACHE_DATA_ON_WRITE) || family.isCacheDataOnWrite(),
conf.getBoolean(CACHE_INDEX_BLOCKS_ON_WRITE_KEY,
DEFAULT_CACHE_INDEXES_ON_WRITE) || family.isCacheIndexesOnWrite(),
conf.getBoolean(CACHE_BLOOM_BLOCKS_ON_WRITE_KEY,
DEFAULT_CACHE_BLOOMS_ON_WRITE) || family.isCacheBloomsOnWrite(),
conf.getBoolean(EVICT_BLOCKS_ON_CLOSE_KEY,
DEFAULT_EVICT_ON_CLOSE) || family.isEvictBlocksOnClose(),
conf.getBoolean(CACHE_DATA_BLOCKS_COMPRESSED_KEY, 
DEFAULT_CACHE_DATA_COMPRESSED),
conf.getBoolean(PREFETCH_BLOCKS_ON_OPEN_KEY,
DEFAULT_PREFETCH_ON_OPEN) || family.isPrefetchBlocksOnOpen(),
conf.getBoolean(HColumnDescriptor.CACHE_DATA_IN_L1,
HColumnDescriptor.DEFAULT_CACHE_DATA_IN_L1) || 
family.isCacheDataInL1()
 );
  }
{code}



 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 

[jira] [Resolved] (HBASE-13864) HColumnDescriptor should parse the output from master and from describe for TTL

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu resolved HBASE-13864.

Resolution: Fixed

Resolving.

Can either reopen for backport or open new JIRA.

 HColumnDescriptor should parse the output from master and from describe for 
 TTL
 ---

 Key: HBASE-13864
 URL: https://issues.apache.org/jira/browse/HBASE-13864
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Elliott Clark
Assignee: Ashu Pachauri
 Fix For: 2.0.0

 Attachments: 13864-branch-1.txt, HBASE-13864-1.patch, 
 HBASE-13864-2.patch, HBASE-13864-3.patch, HBASE-13864-4.patch, 
 HBASE-13864.patch


 The TTL printing on HColumnDescriptor adds a human readable time. When using 
 that string for the create command it throws an error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module

2015-08-03 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-14181:
---

 Summary: Add Spark DataFrame DataSource to HBase-Spark Module
 Key: HBASE-14181
 URL: https://issues.apache.org/jira/browse/HBASE-14181
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Malaska
Assignee: Ted Malaska
Priority: Minor


Build a RelationProvider for HBase-Spark Module.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652677#comment-14652677
 ] 

Andrew Purtell commented on HBASE-14122:


Any further concerns [~anoop.hbase] ? I read your comment as a lgtm plus a 
question, so will proceed with commit tomorrow, or please let me know if I am 
mistaken.


 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652688#comment-14652688
 ] 

Hudson commented on HBASE-13965:


FAILURE: Integrated in HBase-TRUNK #6694 (See 
[https://builds.apache.org/job/HBase-TRUNK/6694/])
HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 
20d1fa36e7ffa1c8d274def831223bff9b04fa69)
* 
hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java
HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 
598cfeb77563a3fea9d0ed467025514662e52ca0)
* hbase-client/src/main/java/org/apache/hadoop/hbase/filter/Filter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652765#comment-14652765
 ] 

Lei Chen commented on HBASE-13965:
--

+1

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652809#comment-14652809
 ] 

Andrew Purtell edited comment on HBASE-14122 at 8/3/15 11:56 PM:
-

bq. Should all these be refactored to use the new master API for checking 
security support?

Let me look into that. Good suggestion. 

Would changing how/if exceptions are thrown when using the AccessControlClient 
and VisibilityClient be a backwards compatibility concern?

At least with the shell, we can avoid ugly nits by checking security feature 
flags in advanced if the API is available. Would also handle the case where the 
new master API isn't available. See what the shell does for the new 
list_security_capabilities command. 


was (Author: apurtell):
bq. Should all these be refactored to use the new master API for checking 
security support?

Let me look into that. Good suggestion.

 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes

2015-08-03 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652873#comment-14652873
 ] 

Ben Lau commented on HBASE-13830:
-

Hey Ryan, do you have more information on this bug.  We are interested in using 
the reverse scan feature at Yahoo and would like to clear up any known bugs 
before internal users take it up for production use.  If you had for example an 
independent program and/or data that could be used to reproduce this issue, we 
would like to see it.  If you cannot reproduce the bug anymore, we'd like to 
know anything else you remember, like the version of HDFS, any custom patches 
you had on your version of HBase, the table schema at the time (eg any 
particular block encodings), etc.

 Hbase REVERSED may throw Exception sometimes
 

 Key: HBASE-13830
 URL: https://issues.apache.org/jira/browse/HBASE-13830
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.1
Reporter: ryan.jin

 run a scan at hbase shell command.
 {code}
 scan 
 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true}
 {code}
 will throw exception
 {code}
 java.io.IOException: java.io.IOException: Could not seekToPreviousRow 
 StoreFileScanner[HFileScanner for reader 
 reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6,
  compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], 
 firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put,
  
 lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put,
  avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, 
 cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0]
  to key 
 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0
   at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128)
   at 
 org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88)
   at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503)
   at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136)
   at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012)
   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38)
   at 
 org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.IOException: On-disk size without header provided is 
 47701, but block header contains 10134. Block offset: -1, data starts with: 
 DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00'
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355)
   at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569)
   at 
 

[jira] [Resolved] (HBASE-14180) Change timeout - SocketTimeoutException because of callTimeout

2015-08-03 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-14180.

Resolution: Invalid

Please write in to u...@hbase.apache.org for help troubleshooting issues. This 
is the project dev tracker. Thanks!

 Change timeout - SocketTimeoutException because of callTimeout
 --

 Key: HBASE-14180
 URL: https://issues.apache.org/jira/browse/HBASE-14180
 Project: HBase
  Issue Type: Bug
  Components: hbase, regionserver, rpc, Zookeeper
Affects Versions: 1.1.1
 Environment: Hadoop with Ambari 2.1.0
 HBase 1.1.1.2.3
 HDFS 2.7.1.2.3
 Zookeeper 3.4.6.2.3
 Phoenix 4.4.0.2.3
Reporter: Adrià V.

 HBase keeps throwing a timeout exception I have tryed every configuration I 
 could think about to increase it.
 Partial stacktrace:
 {quote}
 Caused by: java.io.IOException: Call to 
 hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 failed on local exception: 
 org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, 
 operationTimeout=6 expired.
 at 
 org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1242)
 at 
 org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1210)
 at 
 org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
 at 
 org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
 at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213)
 at 
 org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
 at 
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
 at 
 org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
 ... 4 more
 Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, 
 waitTime=60001, operationTimeout=6 expired.
 at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
 at 
 org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184)
 ... 13 more
 {quote}
 I've tryed editing config files and also setting config in Ambari with the 
 next keys to increase the timeout with no success:
 - hbase.rpc.timeout
 - dfs.socket.timeout
 - dfs.client.socket-timeout
 - zookeeper.session.timeout
 Also the Phoenix properties, but I think it's mostly an HBase issue:
 - phoenix.query.timeoutMs
 - phoenix.query.keepAliveMs
 Full stack trace: 
 {quote}
 Error: Encountered exception in sub plan [0] execution. (state=,code=0)
 java.sql.SQLException: Encountered exception in sub plan [0] execution.
 at 
 org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241)
 at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240)
 at 
 org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250)
 at sqlline.Commands.execute(Commands.java:822)
 at sqlline.Commands.sql(Commands.java:732)
 at sqlline.SqlLine.dispatch(SqlLine.java:808)
 at sqlline.SqlLine.begin(SqlLine.java:681)
 at sqlline.SqlLine.start(SqlLine.java:398)
 at sqlline.SqlLine.main(SqlLine.java:292)
 Caused by: org.apache.phoenix.exception.PhoenixIOException: 
 org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, 
 exceptions:
 Mon Aug 03 16:47:06 UTC 2015, null, java.net.SocketTimeoutException: 
 callTimeout=6, callDuration=60303: row '' on table 'hive_post_topics' at 
 region=hive_post_topics,,1438084107396.cdbdc246ff0b7dfed31d481e0bccd2b5., 
 hostname=hdp-w-1.c.dks-hadoop.internal,16020,1438619912282, seqNum=45322
 at 
 org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
 at 
 org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:542)
 at 
 org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
 at 
 

[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652809#comment-14652809
 ] 

Andrew Purtell commented on HBASE-14122:


bq. Should all these be refactored to use the new master API for checking 
security support?

Let me look into that. Good suggestion.

 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
Attachment: 13965-addendum.txt

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12712) skipLargeFiles in minor compact but not in major compact

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12712:
---
Status: Open  (was: Patch Available)

 skipLargeFiles in minor compact but not in major compact
 

 Key: HBASE-12712
 URL: https://issues.apache.org/jira/browse/HBASE-12712
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Affects Versions: 0.98.6
Reporter: Liu Junhong
  Labels: beginner
 Fix For: 0.98.6

 Attachments: compact.diff

   Original Estimate: 72h
  Remaining Estimate: 72h

 Here is my case. After repeatedly minor compaction, the size of storefile is 
 very large. Compaction with large storefile will waste much bandwidth, so i 
 use the “hbase.hstore.compaction.max.size” to skip this case. But after use 
 this config, i find that major compaction will be skipped forever when i read 
 the source code and the deletes and muti-versions data my waste storage. So i 
 had to modify the code. 
 Now i'm try to submit my patch.But my patch is not perfect. I think there 
 should be an other config to determine if the large size storefile should 
 join major compaction in HColumnDescriptor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12815) Deprecate 0.89-fb specific Data Structures like KeyValue, WALEdit etc

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12815:
---
Status: Open  (was: Patch Available)

 Deprecate 0.89-fb specific Data Structures like KeyValue, WALEdit etc
 -

 Key: HBASE-12815
 URL: https://issues.apache.org/jira/browse/HBASE-12815
 Project: HBase
  Issue Type: Sub-task
  Components: wal
Reporter: Rishit Shroff
Assignee: Rishit Shroff
Priority: Minor
 Attachments: 
 0001-HBASE-12815-Remove-HBase-specific-Data-structures-li.patch


 OSS HBase as different versions of data structures and the current module was 
 retaining old ones from 0.89-fb. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652936#comment-14652936
 ] 

Hudson commented on HBASE-13965:


SUCCESS: Integrated in HBase-1.3-IT #68 (See 
[https://builds.apache.org/job/HBase-1.3-IT/68/])
HBASE-13965 Revert due to test failure in TestAssignmentManager (tedyu: rev 
24dbe25e95d0a355b2e07aa94b5921ff4b4865e9)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652658#comment-14652658
 ] 

Hadoop QA commented on HBASE-13965:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12748522/HBASE-13965-v11.patch
  against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133.
  ATTACHMENT ID: 12748522

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14964//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14964//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14964//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14964//console

This message is automatically generated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
Fix Version/s: (was: 1.3.0)

Test failure in TestAssignmentManager is reproducible.
Reverted from branch-1 for now.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12865) WALs may be deleted before they are replicated to peers

2015-08-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652735#comment-14652735
 ] 

Andrew Purtell commented on HBASE-12865:


Sorry this has sat for a while.

Handles KeeperExceptions better. The new unit test 
testFailoverDeadServerCversionChange verifies the ZK behavior we are expecting. 
Could go in as an improvement. Nice to have: a unit test that confirms we use 
the queues znode cversion correctly.


 WALs may be deleted before they are replicated to peers
 ---

 Key: HBASE-12865
 URL: https://issues.apache.org/jira/browse/HBASE-12865
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Liu Shaohui
Assignee: He Liangliang
Priority: Critical
 Attachments: HBASE-12865-V1.diff, HBASE-12865-V2.diff


 By design, ReplicationLogCleaner guarantee that the WALs  being in 
 replication queue can't been deleted by the HMaster. The 
 ReplicationLogCleaner gets the WAL set from zookeeper by scanning the 
 replication zk node. But it may get uncompleted WAL set during replication 
 failover for the scan operation is not atomic.
 For example: There are three region servers: rs1, rs2, rs3, and peer id 10.  
 The layout of replication zookeeper nodes is:
 {code}
 /hbase/replication/rs/rs1/10/wals
  /rs2/10/wals
  /rs3/10/wals
 {code}
 - t1: the ReplicationLogCleaner finished scanning the replication queue of 
 rs1, and start to scan the queue of rs2.
 - t2: region server rs3 is down, and rs1 take over rs3's replication queue. 
 The new layout is
 {code}
 /hbase/replication/rs/rs1/10/wals
  /rs1/10-rs3/wals
  /rs2/10/wals
  /rs3
 {code}
 - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start 
 to scan the node of rs3. But the the queue has been moved to  
 replication/rs1/10-rs3/WALS
 So the  ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the 
 hmaster may delete these WALs before they are replicated to peer clusters.
 We encountered this problem in our cluster and I think it's a serious bug for 
 replication.
 Suggestions are welcomed to fix this bug. thx~



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652947#comment-14652947
 ] 

Hudson commented on HBASE-13965:


SUCCESS: Integrated in HBase-1.3 #86 (See 
[https://builds.apache.org/job/HBase-1.3/86/])
HBASE-13965 Revert due to test failure in TestAssignmentManager (tedyu: rev 
24dbe25e95d0a355b2e07aa94b5921ff4b4865e9)
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* 
hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13986:
---
Status: Open  (was: Patch Available)

 HMaster instance always returns false for isAborted() check.
 

 Key: HBASE-13986
 URL: https://issues.apache.org/jira/browse/HBASE-13986
 Project: HBase
  Issue Type: Bug
Reporter: Abhishek Kumar
Assignee: Abhishek Kumar
Priority: Minor
 Attachments: HBASE-13986.patch


 It seems that HMaster never set abortRequested flag to true as done by 
 HRegionServer in its abort() method.We can see isAborted method being used in 
 few places for HMaster instance (like in HMasterCommandLine.startMaster) 
 where code flow being determined based on the result of isAborted() call.
 We can set this abortRequested flag in Hmaster's abort() method as well like 
 in HRegionServer's abort method, let me know if it seems ok. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652654#comment-14652654
 ] 

Hudson commented on HBASE-13965:


FAILURE: Integrated in HBase-1.3 #85 (See 
[https://builds.apache.org/job/HBase-1.3/85/])
HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 
c215b900f49685989083df3786bd8441700c248a)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reopened HBASE-13965:


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13881) Bug in HTable#incrementColumnValue implementation

2015-08-03 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652710#comment-14652710
 ] 

Nick Dimiduk commented on HBASE-13881:
--

I believe this ticket warrants a release note.

 Bug in HTable#incrementColumnValue implementation
 -

 Key: HBASE-13881
 URL: https://issues.apache.org/jira/browse/HBASE-13881
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.6.1, 1.0.1
Reporter: Jerry Lam
Assignee: Gabor Liptak
 Fix For: 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13881.branch-1.1.patch


 The exact method I'm talking about is:
 {code}
 @Deprecated
   @Override
   public long incrementColumnValue(final byte [] row, final byte [] family,
   final byte [] qualifier, final long amount, final boolean writeToWAL)
   throws IOException {
 return incrementColumnValue(row, family, qualifier, amount,
   writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT);
   }
 {code}
 Setting writeToWAL to true, Durability will be set to SKIP_WAL which does not 
 make much sense unless the meaning of SKIP_WAL is negated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
Attachment: 13965-addendum.txt

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module

2015-08-03 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652731#comment-14652731
 ] 

Ted Malaska commented on HBASE-14181:
-

Note a dataSource in Spark can have a lot of advanced functionality like Filter 
push down, Scan Range push down, and column filters.

This Jira will try to get a base implementation down.  But will leave room for 
more advanced functionality in additional jiras.

Ted Malaska

 Add Spark DataFrame DataSource to HBase-Spark Module
 

 Key: HBASE-14181
 URL: https://issues.apache.org/jira/browse/HBASE-14181
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Malaska
Assignee: Ted Malaska
Priority: Minor

 Build a RelationProvider for HBase-Spark Module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652796#comment-14652796
 ] 

Jerry He commented on HBASE-14122:
--

Looks good overall.

We have AccessControlClient and VisibilityClient, which all go directly to the 
co-processor security endpoints. If calls are made while the security endpoints 
were not installed, they will get exception like:
org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered 
coprocessor service found for name ...

On the shell side, the security commands (grant, revoke) will error out based 
on the non-existence of 'hbase:acl' table: DISABLED: Security features are not 
available
The visibility command will error out based on the non-existence of 
'hbase:labels' table: DISABLED: Visibility labels feature is not available

Should all these be refactored to use the new master API for checking security 
support?


 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14179) catch the same Exception twice

2015-08-03 Thread Gabor Liptak (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652797#comment-14652797
 ] 

Gabor Liptak commented on HBASE-14179:
--

The outer catch is for:

ZKUtil.getData(this.watcher, nodePath);

also throwing InterruptedException

Both catch-es are needed.

 catch the same Exception twice
 --

 Key: HBASE-14179
 URL: https://issues.apache.org/jira/browse/HBASE-14179
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.1, 1.1.0, 1.1.1, 1.1.0.1
Reporter: songwanging
Priority: Minor

 In method markRegionsRecovering() of class: 
 hbase-1.1.1\hbase-server\src\main\java\org\apache\hadoop\hbase\coordination\ZKSplitLogManagerCoordination.java
 InterruptedException is catched twice.
   public void markRegionsRecovering(final ServerName serverName, 
 SetHRegionInfo userRegions)
   throws IOException, InterruptedIOException {
 ...
try {
 Thread.sleep(20);
   } catch (InterruptedException e1) {
 throw new InterruptedIOException();
   }
 } catch (InterruptedException e) {
   throw new InterruptedIOException();
 }
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12923) ResultScanner is not closed in ModifyTableHandler#removeReplicaColumnsIfNeeded()

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-12923:
---
Status: Open  (was: Patch Available)

 ResultScanner is not closed in 
 ModifyTableHandler#removeReplicaColumnsIfNeeded()
 

 Key: HBASE-12923
 URL: https://issues.apache.org/jira/browse/HBASE-12923
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Trivial
 Attachments: 12923-v1.txt


 In ModifyTableHandler#removeReplicaColumnsIfNeeded():
 {code}
   ResultScanner resScanner = metaTable.getScanner(scan);
   for (Result result : resScanner) {
 {code}
 The ResultScanner is not closed upon exit from the method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652650#comment-14652650
 ] 

Hudson commented on HBASE-13965:


SUCCESS: Integrated in HBase-1.3-IT #67 (See 
[https://builds.apache.org/job/HBase-1.3-IT/67/])
HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 
c215b900f49685989083df3786bd8441700c248a)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java
* 
hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java
* 
hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java
* hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
Attachment: (was: 13965-addendum.txt)

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652720#comment-14652720
 ] 

Ted Yu commented on HBASE-13965:


Addendum deals with the scenario where connector port has been taken on the 
test machine.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Jerry He (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653069#comment-14653069
 ] 

Jerry He commented on HBASE-14122:
--

Another comment.
You seem to be using 'UnsupportedOperationException' for backward 
compatibility, depending on it being thrown by the RPC facility if the method 
can not be located on the server side?
Have not seen such example before in the HBase code.   This is probably ok.   
Usually we explicitly construct the 'UnsupportedOperationException' at user 
code level?
It works fine?  The exception will be correctly propagated to the client?

 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Attachment: HBASE-14178-0.98.patch

upload patch for branch 0.98

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Attachment: (was: HBASE-14178-0.98.patch)

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security

2015-08-03 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653053#comment-14653053
 ] 

Anoop Sam John commented on HBASE-14122:


bq.Any further concerns Anoop Sam John ? I read your comment as a lgtm plus a 
question, so will proceed with commit tomorrow
Sorry for not giving explicit +1 later.  Yes that was a minor question and you 
addressed it already. I am +1.
Ya will be nice to have as per Jerry's suggestion.  Even with out that am +1 :-)

 Client API for determining if server side supports cell level security
 --

 Key: HBASE-14122
 URL: https://issues.apache.org/jira/browse/HBASE-14122
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0

 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, 
 HBASE-14122.patch, HBASE-14122.patch


 Add a client API for determining if the server side supports cell level 
 security. 
 Ask the master, assuming as we do in many other instances that the master and 
 regionservers all have a consistent view of site configuration.
 Return {{true}} if all features required for cell level security are present, 
 {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master 
 does not have support for the RPC call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14179) catch the same Exception twice

2015-08-03 Thread Anoop Sam John (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John resolved HBASE-14179.

Resolution: Invalid

 catch the same Exception twice
 --

 Key: HBASE-14179
 URL: https://issues.apache.org/jira/browse/HBASE-14179
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.1, 1.1.0, 1.1.1, 1.1.0.1
Reporter: songwanging
Priority: Minor

 In method markRegionsRecovering() of class: 
 hbase-1.1.1\hbase-server\src\main\java\org\apache\hadoop\hbase\coordination\ZKSplitLogManagerCoordination.java
 InterruptedException is catched twice.
   public void markRegionsRecovering(final ServerName serverName, 
 SetHRegionInfo userRegions)
   throws IOException, InterruptedIOException {
 ...
try {
 Thread.sleep(20);
   } catch (InterruptedException e1) {
 throw new InterruptedIOException();
   }
 } catch (InterruptedException e) {
   throw new InterruptedIOException();
 }
 ...
 }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Status: Open  (was: Patch Available)

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Status: Patch Available  (was: Open)

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-08-03 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-13825:
---
Attachment: HBASE-13825-branch-1.patch
HBASE-13825-0.98.patch
HBASE-13825.patch

I followed the reference to HBASE-14076 over to HBASE-13230. The solution there 
is to use the static helper ProtobufUtil#mergeDelimitedFrom wherever we've 
written a delimited message and would use mergeDelmitedFrom to read it back in, 
since the delimited message format begins with the total message size encoded 
in vint32. We use the encoded size to adjust the CodedInputStream limit as 
needed. 

Patches here also address relevant uses of builder#mergeFrom. We use 
Integer.MAX_VALUE as the size limit for CodedInputStream where it is not known. 
In some places it's unlikely a message processed there will exceed 64 MB, but I 
made a change anyway. It is harmless and consistent to use 
ProtobufUtil#mergeFrom.

branch-1 and 0.98 patches also incorporate HBASE-14076.

Reviewboard: https://reviews.apache.org/r/37062/

/cc [~stack] Touched a lot of your code here.

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani
Assignee: Andrew Purtell
 Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0

 Attachments: HBASE-13825-0.98.patch, HBASE-13825-branch-1.patch, 
 HBASE-13825.patch


 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653005#comment-14653005
 ] 

Hudson commented on HBASE-13965:


FAILURE: Integrated in HBase-TRUNK #6695 (See 
[https://builds.apache.org/job/HBase-TRUNK/6695/])
HBASE-13965 Addendum tries different connector ports if BindException is 
encountered (tedyu: rev 931e77d4507e1650c452cefadda450e0bf3f0528)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0

 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, 
 HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, 
 HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, 
 HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, 
 HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651691#comment-14651691
 ] 

Anoop Sam John commented on HBASE-14178:


This means when the current scan comes in which says cache read blocks as 
false, we wont even consult block for reading that?  IMO that is wrong. When 
block is available, we should try read from that. Agree.. when the table CF is 
set to be not to cache data from that CF at all, there is no point in looking 
into the cache. So in cases of BC is disabled as well as the read CF is set to 
be cache block = false, no need to obtain lock at all.  But the code change 
seems not just this much..

So IMO there can be 2 changes in code
1. Move the below piece of code inside the check  if 
(cacheConf.isBlockCacheEnabled())
{code}
if (!useLock) {
  // check cache again with lock
  useLock = true;
  continue;
}
{code}
2. The outer if check , (ie.  if (cacheConf.isBlockCacheEnabled()) ) itself 
should be changed to include the check for the CF level cache setting.  
(HCD#setBlockCacheEnabled)

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 

[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Attachment: HBASE-14178_v3.patch

Update patch,   modify TestFromClientSide.testCacheOnWriteEvictOnClose:
After compaction, we don't modify the expectedBlockHits

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

2015-08-03 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651962#comment-14651962
 ] 

Ted Malaska commented on HBASE-14150:
-

Initial tests are successful.  I'm going to do some clean up and more tests and 
I will submit a patch soon.

 Add BulkLoad functionality to HBase-Spark Module
 

 Key: HBASE-14150
 URL: https://issues.apache.org/jira/browse/HBASE-14150
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska

 Add on to the work done in HBASE-13992 to add functionality to do a bulk load 
 from a given RDD.
 This will do the following:
 1. figure out the number of regions and sort and partition the data correctly 
 to be written out to HFiles
 2. Also unlike the MR bulkload I would like that the columns to be sorted in 
 the shuffle stage and not in the memory of the reducer.  This will allow this 
 design to support super wide records with out going out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651976#comment-14651976
 ] 

Anoop Sam John commented on HBASE-14178:


{code}
public boolean shouldCacheBlockOnRead(BlockCategory category) {
return isBlockCacheEnabled()
 (cacheDataOnRead ||
category == BlockCategory.INDEX ||
category == BlockCategory.BLOOM ||
(prefetchOnOpen 
(category != BlockCategory.META 
 category != BlockCategory.UNKNOWN)));
  }
{code}
You can see the call to 
cacheConf.shouldCacheBlockOnRead(expectedBlockType.getCategory()) checks wrt 
whether the read request says the block to be cached after this read.  It is 
not telling abt the CF level setting of whether data to be cached at all or 
not.   We have to make that info available here in HFileReader.  And when we 
read the index or meta blocks we have to consult BC.(which u do already)

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at 

[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652247#comment-14652247
 ] 

Lei Chen commented on HBASE-14082:
--

Would it be simpler if we put the replica_id also in the Regions instead of 
creating a new MBean? 
The replica id can be queried using wildcard matching, without the need of 
searching in the name to replica_id map.

e.g.
{code}
Regions: {
namespace_default_table_foo_region_aaabbb_metric_mutateCount: 100,
namespace_default_table_foo_region_aaabbb_metric_replicaid: 0,
namespace_default_table_foo_region_bbbccc_metric_mutateCount: 100,
namespace_default_table_foo_region_bbbccc_metric_replicaid: 1,
}
{code}

 Add replica id to JMX metrics names
 ---

 Key: HBASE-14082
 URL: https://issues.apache.org/jira/browse/HBASE-14082
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch


 Today, via JMX, one cannot distinguish a primary region from a replica. A 
 possible solution is to add replica id to JMX metrics names. The benefits may 
 include, for example:
 # Knowing the latency of a read request on a replica region means the first 
 attempt to the primary region has timeout.
 # Write requests on replicas are due to the replication process, while the 
 ones on primary are from clients.
 # In case of looking for hot spots of read operations, replicas should be 
 excluded since TIMELINE reads are sent to all replicas.
 To implement, we can change the format of metrics names found at 
 {code}Hadoop-HBase-RegionServer-Regions-Attributes{code}
 from 
 {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code}
 to
 {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652252#comment-14652252
 ] 

Hadoop QA commented on HBASE-13965:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12748476/HBASE-13965-v10.patch
  against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133.
  ATTACHMENT ID: 12748476

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.sentry.tests.e2e.hive.hiveserver.AbstractHiveServer.createConnection(AbstractHiveServer.java:63)
at 
org.apache.sentry.tests.e2e.hive.Context.createConnection(Context.java:92)
at 
org.apache.sentry.tests.e2e.hive.AbstractTestWithStaticConfiguration.setupAdmin(AbstractTestWithStaticConfiguration.java:472)
at 
org.apache.sentry.tests.e2e.dbprovider.TestDatabaseProvider.testGrantRevokeRoleToGroups(TestDatabaseProvider.java:2037)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14963//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14963//console

This message is automatically generated.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652314#comment-14652314
 ] 

Lei Chen commented on HBASE-13965:
--

thanks, I will update soon

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652270#comment-14652270
 ] 

Ted Yu commented on HBASE-13965:


@Lei :
The following file triggered release audit warning:
hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource

Take a look at 
./hbase-hadoop2-compat/src/test/resources/META-INF/services/org.apache.hadoop.hbase.HadoopShims
 to see how license is added.

I will commit the next patch with the above fix.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, 
 HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, 
 HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, 
 HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, 
 stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14131) HBase Backup/Restore Phase 2: Describe backup image

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14131.
---
Resolution: Implemented

See parent JIRA (HBASE-14123).

 HBase Backup/Restore Phase 2: Describe backup image
 ---

 Key: HBASE-14131
 URL: https://issues.apache.org/jira/browse/HBASE-14131
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14132) HBase Backup/Restore Phase 2: History of backups

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14132.
---
Resolution: Implemented

See parent JIRA (HBASE-14123).

 HBase Backup/Restore Phase 2: History of backups
 

 Key: HBASE-14132
 URL: https://issues.apache.org/jira/browse/HBASE-14132
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14180) Change timeout - SocketTimeoutException because of callTimeout

2015-08-03 Thread JIRA
Adrià V. created HBASE-14180:


 Summary: Change timeout - SocketTimeoutException because of 
callTimeout
 Key: HBASE-14180
 URL: https://issues.apache.org/jira/browse/HBASE-14180
 Project: HBase
  Issue Type: Bug
  Components: hbase, regionserver, rpc, Zookeeper
Affects Versions: 1.1.1
 Environment: Hadoop with Ambari 2.1.0
HBase 1.1.1.2.3
HDFS 2.7.1.2.3
Zookeeper 3.4.6.2.3
Phoenix 4.4.0.2.3
Reporter: Adrià V.


HBase keeps throwing a timeout exception I have tryed every configuration I 
could think about to increase it.

Partial stacktrace:
{quote}
Caused by: java.io.IOException: Call to 
hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, 
operationTimeout=6 expired.
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1242)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1210)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213)
at 
org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, 
waitTime=60001, operationTimeout=6 expired.
at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70)
at 
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184)
... 13 more
{quote}

I've tryed editing config files and also setting config in Ambari with the next 
keys to increase the timeout with no success:
- hbase.rpc.timeout
- dfs.socket.timeout
- dfs.client.socket-timeout
- zookeeper.session.timeout

Also the Phoenix properties, but I think it's mostly an HBase issue:
- phoenix.query.timeoutMs
- phoenix.query.keepAliveMs

Full stack trace: 
{quote}
Error: Encountered exception in sub plan [0] execution. (state=,code=0)
java.sql.SQLException: Encountered exception in sub plan [0] execution.
at 
org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157)
at 
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251)
at 
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at 
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240)
at 
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250)
at sqlline.Commands.execute(Commands.java:822)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:808)
at sqlline.SqlLine.begin(SqlLine.java:681)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:292)
Caused by: org.apache.phoenix.exception.PhoenixIOException: 
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, 
exceptions:
Mon Aug 03 16:47:06 UTC 2015, null, java.net.SocketTimeoutException: 
callTimeout=6, callDuration=60303: row '' on table 'hive_post_topics' at 
region=hive_post_topics,,1438084107396.cdbdc246ff0b7dfed31d481e0bccd2b5., 
hostname=hdp-w-1.c.dks-hadoop.internal,16020,1438619912282, seqNum=45322

at 
org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108)
at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:542)
at 
org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
at 
org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
at 
org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:106)
at 
org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82)
at 
org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:339)
at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:136)
at 

[jira] [Commented] (HBASE-12890) Provide a way to throttle the number of regions moved by the balancer

2015-08-03 Thread Dave Latham (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652192#comment-14652192
 ] 

Dave Latham commented on HBASE-12890:
-

Thanks, Ted and Andrew.  Can it be committed?

 Provide a way to throttle the number of regions moved by the balancer
 -

 Key: HBASE-12890
 URL: https://issues.apache.org/jira/browse/HBASE-12890
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.98.10
Reporter: churro morales
Assignee: churro morales
 Fix For: 2.0.0, 0.98.14, 1.3.0

 Attachments: HBASE-12890.patch


 We have a very large cluster and we frequently add remove quite a few 
 regionservers from our cluster.  Whenever we do this the balancer moves 
 thousands of regions at once.  Instead we provide a configuration parameter: 
 hbase.balancer.max.regions.  This limits the number of regions that are 
 balanced per iteration.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14133) HBase Backup/Restore Phase 2: Status (and progress) of backup request

2015-08-03 Thread Vladimir Rodionov (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652098#comment-14652098
 ] 

Vladimir Rodionov commented on HBASE-14133:
---

See parent JIRA (HBASE-14123).

 HBase Backup/Restore Phase 2: Status (and progress) of backup request
 -

 Key: HBASE-14133
 URL: https://issues.apache.org/jira/browse/HBASE-14133
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14133) HBase Backup/Restore Phase 2: Status (and progress) of backup request

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14133.
---
Resolution: Implemented

 HBase Backup/Restore Phase 2: Status (and progress) of backup request
 -

 Key: HBASE-14133
 URL: https://issues.apache.org/jira/browse/HBASE-14133
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14123) HBase Backup/Restore Phase 2

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov updated HBASE-14123:
--
Attachment: HBASE-14123-v1.patch

First patch, incorporates (HBASE-14125, HBASE-14130, HBASE-14131, HBASE-14132, 
HBASE-14133)

 HBase Backup/Restore Phase 2
 

 Key: HBASE-14123
 URL: https://issues.apache.org/jira/browse/HBASE-14123
 Project: HBase
  Issue Type: Umbrella
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov
 Attachments: HBASE-14123-v1.patch


 Phase 2 umbrella JIRA. See HBASE-7912 for design document and description. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14125) HBase Backup/Restore Phase 2: Cancel backup

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14125.
---
Resolution: Implemented

See parent JIRA (HBASE-14123).

 HBase Backup/Restore Phase 2: Cancel backup
 ---

 Key: HBASE-14125
 URL: https://issues.apache.org/jira/browse/HBASE-14125
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov

 Cancel backup operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14130) HBase Backup/Restore Phase 2: Delete backup image

2015-08-03 Thread Vladimir Rodionov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Rodionov resolved HBASE-14130.
---
Resolution: Implemented

See parent JIRA (HBASE-14123).

 HBase Backup/Restore Phase 2: Delete backup image
 -

 Key: HBASE-14130
 URL: https://issues.apache.org/jira/browse/HBASE-14130
 Project: HBase
  Issue Type: New Feature
Reporter: Vladimir Rodionov
Assignee: Vladimir Rodionov





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock

2015-08-03 Thread Heng Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen updated HBASE-14178:
--
Attachment: HBASE-14178_v4.patch

 regionserver blocks because of waiting for offsetLock
 -

 Key: HBASE-14178
 URL: https://issues.apache.org/jira/browse/HBASE-14178
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.98.6
Reporter: Heng Chen
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, 
 HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, 
 HBASE-14178_v4.patch, jstack


 My regionserver blocks, and all client rpc timeout. 
 I print the regionserver's jstack,  it seems a lot of threads were blocked 
 for waiting offsetLock, detail infomation belows:
 PS:  my table's block cache is off
 {code}
 B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 
 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000]
java.lang.Thread.State: WAITING (on object monitor)
 at java.lang.Object.wait(Native Method)
 at java.lang.Object.wait(Object.java:502)
 at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79)
 - locked 0x000773af7c18 (a 
 org.apache.hadoop.hbase.util.IdLock$Entry)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524)
 at 
 org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257)
 at 
 org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173)
 at 
 org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683)
 at 
 org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533)
 at 
 org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820)
 - locked 0x0005e5c55ad0 (a 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779)
 at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753)
 at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
 at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
 - 0x0005e5c55c08 (a 
 java.util.concurrent.locks.ReentrantLock$NonfairSync)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module

2015-08-03 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14150:

Attachment: HBASE-14150.1.patch

First draft of BulkLoad with Spark.

This patch includes:
1. HBaseContext Implementation
2. RDD Implicit Implementation
3. Unit Test

 Add BulkLoad functionality to HBase-Spark Module
 

 Key: HBASE-14150
 URL: https://issues.apache.org/jira/browse/HBASE-14150
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Attachments: HBASE-14150.1.patch


 Add on to the work done in HBASE-13992 to add functionality to do a bulk load 
 from a given RDD.
 This will do the following:
 1. figure out the number of regions and sort and partition the data correctly 
 to be written out to HFiles
 2. Also unlike the MR bulkload I would like that the columns to be sorted in 
 the shuffle stage and not in the memory of the reducer.  This will allow this 
 design to support super wide records with out going out of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Lei Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Chen updated HBASE-13965:
-
Attachment: HBASE-13965-v11.patch

Updates:
1. License added for 
{{hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource}}


 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics

2015-08-03 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-13965:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the patch, Lei.

 Stochastic Load Balancer JMX Metrics
 

 Key: HBASE-13965
 URL: https://issues.apache.org/jira/browse/HBASE-13965
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, metrics
Reporter: Lei Chen
Assignee: Lei Chen
 Fix For: 2.0.0, 1.3.0

 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, 
 HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, 
 HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, 
 HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, 
 HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png


 Today’s default HBase load balancer (the Stochastic load balancer) is cost 
 function based. The cost function weights are tunable but no visibility into 
 those cost function results is directly provided.
 A driving example is a cluster we have been tuning which has skewed rack size 
 (one rack has half the nodes of the other few racks). We are tuning the 
 cluster for uniform response time from all region servers with the ability to 
 tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and 
 RegionCountSkew Cost is difficult without a way to attribute each cost 
 function’s contribution to overall cost. 
 What this jira proposes is to provide visibility via JMX into each cost 
 function of the stochastic load balancer, as well as the overall cost of the 
 balancing plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)