[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Hadoop Flags: Reviewed Fix Version/s: 1.3.0 2.0.0 Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v10.patch Updates: 1. Spelling and formatting 2. LOG level changed to error when failed to get size of all tables. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Status: Patch Available (was: Open) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651944#comment-14651944 ] Hadoop QA commented on HBASE-14178: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12748436/HBASE-14178_v4.patch against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133. ATTACHMENT ID: 12748436 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 2 zombie test(s): at org.apache.hadoop.mapred.TestMRIntermediateDataEncryption.testMultipleReducers(TestMRIntermediateDataEncryption.java:70) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14962//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14962//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14962//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14962//console This message is automatically generated. regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652037#comment-14652037 ] Ted Yu commented on HBASE-13965: Planning to commit once QA run passes. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652053#comment-14652053 ] Heng Chen commented on HBASE-14178: --- {quote} You can see the call to cacheConf.shouldCacheBlockOnRead(expectedBlockType.getCategory()) checks wrt whether the read request says the block to be cached after this read. It is not telling abt the CF level setting of whether data to be cached at all or not. {quote} cacheDataOnRead represents for CF Level cache setting. You can see cacheDataOnRead is initialized in CacheConfig constructor {code} CacheConfig(final BlockCache blockCache, final boolean cacheDataOnRead, final boolean inMemory, final boolean cacheDataOnWrite, final boolean cacheIndexesOnWrite, final boolean cacheBloomsOnWrite, final boolean evictOnClose, final boolean cacheDataCompressed, final boolean prefetchOnOpen, final boolean cacheDataInL1) { this.blockCache = blockCache; this.cacheDataOnRead = cacheDataOnRead; this.inMemory = inMemory; this.cacheDataOnWrite = cacheDataOnWrite; this.cacheIndexesOnWrite = cacheIndexesOnWrite; this.cacheBloomsOnWrite = cacheBloomsOnWrite; this.evictOnClose = evictOnClose; this.cacheDataCompressed = cacheDataCompressed; this.prefetchOnOpen = prefetchOnOpen; this.cacheDataInL1 = cacheDataInL1; LOG.info(this); } {code} And this constructor is called by another constructor {code} public CacheConfig(Configuration conf, HColumnDescriptor family) { this(CacheConfig.instantiateBlockCache(conf), family.isBlockCacheEnabled(), family.isInMemory(), // For the following flags we enable them regardless of per-schema settings // if they are enabled in the global configuration. conf.getBoolean(CACHE_BLOCKS_ON_WRITE_KEY, DEFAULT_CACHE_DATA_ON_WRITE) || family.isCacheDataOnWrite(), conf.getBoolean(CACHE_INDEX_BLOCKS_ON_WRITE_KEY, DEFAULT_CACHE_INDEXES_ON_WRITE) || family.isCacheIndexesOnWrite(), conf.getBoolean(CACHE_BLOOM_BLOCKS_ON_WRITE_KEY, DEFAULT_CACHE_BLOOMS_ON_WRITE) || family.isCacheBloomsOnWrite(), conf.getBoolean(EVICT_BLOCKS_ON_CLOSE_KEY, DEFAULT_EVICT_ON_CLOSE) || family.isEvictBlocksOnClose(), conf.getBoolean(CACHE_DATA_BLOCKS_COMPRESSED_KEY, DEFAULT_CACHE_DATA_COMPRESSED), conf.getBoolean(PREFETCH_BLOCKS_ON_OPEN_KEY, DEFAULT_PREFETCH_ON_OPEN) || family.isPrefetchBlocksOnOpen(), conf.getBoolean(HColumnDescriptor.CACHE_DATA_IN_L1, HColumnDescriptor.DEFAULT_CACHE_DATA_IN_L1) || family.isCacheDataInL1() ); } {code} regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at
[jira] [Resolved] (HBASE-13864) HColumnDescriptor should parse the output from master and from describe for TTL
[ https://issues.apache.org/jira/browse/HBASE-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-13864. Resolution: Fixed Resolving. Can either reopen for backport or open new JIRA. HColumnDescriptor should parse the output from master and from describe for TTL --- Key: HBASE-13864 URL: https://issues.apache.org/jira/browse/HBASE-13864 Project: HBase Issue Type: Bug Components: shell Reporter: Elliott Clark Assignee: Ashu Pachauri Fix For: 2.0.0 Attachments: 13864-branch-1.txt, HBASE-13864-1.patch, HBASE-13864-2.patch, HBASE-13864-3.patch, HBASE-13864-4.patch, HBASE-13864.patch The TTL printing on HColumnDescriptor adds a human readable time. When using that string for the create command it throws an error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
Ted Malaska created HBASE-14181: --- Summary: Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652677#comment-14652677 ] Andrew Purtell commented on HBASE-14122: Any further concerns [~anoop.hbase] ? I read your comment as a lgtm plus a question, so will proceed with commit tomorrow, or please let me know if I am mistaken. Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652688#comment-14652688 ] Hudson commented on HBASE-13965: FAILURE: Integrated in HBase-TRUNK #6694 (See [https://builds.apache.org/job/HBase-TRUNK/6694/]) HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 20d1fa36e7ffa1c8d274def831223bff9b04fa69) * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev 598cfeb77563a3fea9d0ed467025514662e52ca0) * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/Filter.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-client/src/main/java/org/apache/hadoop/hbase/filter/FilterWrapper.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652765#comment-14652765 ] Lei Chen commented on HBASE-13965: -- +1 Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652809#comment-14652809 ] Andrew Purtell edited comment on HBASE-14122 at 8/3/15 11:56 PM: - bq. Should all these be refactored to use the new master API for checking security support? Let me look into that. Good suggestion. Would changing how/if exceptions are thrown when using the AccessControlClient and VisibilityClient be a backwards compatibility concern? At least with the shell, we can avoid ugly nits by checking security feature flags in advanced if the API is available. Would also handle the case where the new master API isn't available. See what the shell does for the new list_security_capabilities command. was (Author: apurtell): bq. Should all these be refactored to use the new master API for checking security support? Let me look into that. Good suggestion. Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13830) Hbase REVERSED may throw Exception sometimes
[ https://issues.apache.org/jira/browse/HBASE-13830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652873#comment-14652873 ] Ben Lau commented on HBASE-13830: - Hey Ryan, do you have more information on this bug. We are interested in using the reverse scan feature at Yahoo and would like to clear up any known bugs before internal users take it up for production use. If you had for example an independent program and/or data that could be used to reproduce this issue, we would like to see it. If you cannot reproduce the bug anymore, we'd like to know anything else you remember, like the version of HDFS, any custom patches you had on your version of HBase, the table schema at the time (eg any particular block encodings), etc. Hbase REVERSED may throw Exception sometimes Key: HBASE-13830 URL: https://issues.apache.org/jira/browse/HBASE-13830 Project: HBase Issue Type: Bug Affects Versions: 0.98.1 Reporter: ryan.jin run a scan at hbase shell command. {code} scan 'analytics_access',{ENDROW='9223370603647713262-flume01.hadoop-10.32.117.111-373563509',LIMIT=10,REVERSED=true} {code} will throw exception {code} java.io.IOException: java.io.IOException: Could not seekToPreviousRow StoreFileScanner[HFileScanner for reader reader=hdfs://nameservice1/hbase/data/default/analytics_access/a54c47c568c00dd07f9d92cfab1accc7/cf/2e3a107e9fec4930859e992b61fb22f6, compression=lzo, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=9223370603542781142-flume01.hadoop-10.32.117.111-378180911/cf:key/1433311994702/Put, lastKey=9223370603715515112-flume01.hadoop-10.32.117.111-370923552/cf:timestamp/1433139261951/Put, avgKeyLen=80, avgValueLen=115, entries=43544340, length=1409247455, cur=9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0] to key 9223370603647710245-flume01.hadoop-10.32.117.111-373563545/cf:payload/1433207065597/Put/vlen=644/mvcc=0 at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekToPreviousRow(StoreFileScanner.java:448) at org.apache.hadoop.hbase.regionserver.ReversedKeyValueHeap.seekToPreviousRow(ReversedKeyValueHeap.java:88) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToPreviousRow(ReversedStoreScanner.java:128) at org.apache.hadoop.hbase.regionserver.ReversedStoreScanner.seekToNextRow(ReversedStoreScanner.java:88) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:503) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3866) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3946) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3814) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3805) at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3136) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2012) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:160) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:38) at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:110) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: On-disk size without header provided is 47701, but block header contains 10134. Block offset: -1, data starts with: DATABLK*\x00\x00'\x96\x00\x01\x00\x04\x00\x00\x00\x005\x96^\xD2\x01\x00\x00@\x00\x00\x00' at org.apache.hadoop.hbase.io.hfile.HFileBlock.validateOnDiskSizeWithoutHeader(HFileBlock.java:451) at org.apache.hadoop.hbase.io.hfile.HFileBlock.access$400(HFileBlock.java:87) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1466) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1314) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:355) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekBefore(HFileReaderV2.java:569) at
[jira] [Resolved] (HBASE-14180) Change timeout - SocketTimeoutException because of callTimeout
[ https://issues.apache.org/jira/browse/HBASE-14180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-14180. Resolution: Invalid Please write in to u...@hbase.apache.org for help troubleshooting issues. This is the project dev tracker. Thanks! Change timeout - SocketTimeoutException because of callTimeout -- Key: HBASE-14180 URL: https://issues.apache.org/jira/browse/HBASE-14180 Project: HBase Issue Type: Bug Components: hbase, regionserver, rpc, Zookeeper Affects Versions: 1.1.1 Environment: Hadoop with Ambari 2.1.0 HBase 1.1.1.2.3 HDFS 2.7.1.2.3 Zookeeper 3.4.6.2.3 Phoenix 4.4.0.2.3 Reporter: Adrià V. HBase keeps throwing a timeout exception I have tryed every configuration I could think about to increase it. Partial stacktrace: {quote} Caused by: java.io.IOException: Call to hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1242) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1210) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) ... 4 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184) ... 13 more {quote} I've tryed editing config files and also setting config in Ambari with the next keys to increase the timeout with no success: - hbase.rpc.timeout - dfs.socket.timeout - dfs.client.socket-timeout - zookeeper.session.timeout Also the Phoenix properties, but I think it's mostly an HBase issue: - phoenix.query.timeoutMs - phoenix.query.keepAliveMs Full stack trace: {quote} Error: Encountered exception in sub plan [0] execution. (state=,code=0) java.sql.SQLException: Encountered exception in sub plan [0] execution. at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250) at sqlline.Commands.execute(Commands.java:822) at sqlline.Commands.sql(Commands.java:732) at sqlline.SqlLine.dispatch(SqlLine.java:808) at sqlline.SqlLine.begin(SqlLine.java:681) at sqlline.SqlLine.start(SqlLine.java:398) at sqlline.SqlLine.main(SqlLine.java:292) Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Mon Aug 03 16:47:06 UTC 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=60303: row '' on table 'hive_post_topics' at region=hive_post_topics,,1438084107396.cdbdc246ff0b7dfed31d481e0bccd2b5., hostname=hdp-w-1.c.dks-hadoop.internal,16020,1438619912282, seqNum=45322 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:542) at org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176) at
[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652809#comment-14652809 ] Andrew Purtell commented on HBASE-14122: bq. Should all these be refactored to use the new master API for checking security support? Let me look into that. Good suggestion. Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Attachment: 13965-addendum.txt Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12712) skipLargeFiles in minor compact but not in major compact
[ https://issues.apache.org/jira/browse/HBASE-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12712: --- Status: Open (was: Patch Available) skipLargeFiles in minor compact but not in major compact Key: HBASE-12712 URL: https://issues.apache.org/jira/browse/HBASE-12712 Project: HBase Issue Type: New Feature Components: Compaction Affects Versions: 0.98.6 Reporter: Liu Junhong Labels: beginner Fix For: 0.98.6 Attachments: compact.diff Original Estimate: 72h Remaining Estimate: 72h Here is my case. After repeatedly minor compaction, the size of storefile is very large. Compaction with large storefile will waste much bandwidth, so i use the “hbase.hstore.compaction.max.size” to skip this case. But after use this config, i find that major compaction will be skipped forever when i read the source code and the deletes and muti-versions data my waste storage. So i had to modify the code. Now i'm try to submit my patch.But my patch is not perfect. I think there should be an other config to determine if the large size storefile should join major compaction in HColumnDescriptor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12815) Deprecate 0.89-fb specific Data Structures like KeyValue, WALEdit etc
[ https://issues.apache.org/jira/browse/HBASE-12815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12815: --- Status: Open (was: Patch Available) Deprecate 0.89-fb specific Data Structures like KeyValue, WALEdit etc - Key: HBASE-12815 URL: https://issues.apache.org/jira/browse/HBASE-12815 Project: HBase Issue Type: Sub-task Components: wal Reporter: Rishit Shroff Assignee: Rishit Shroff Priority: Minor Attachments: 0001-HBASE-12815-Remove-HBase-specific-Data-structures-li.patch OSS HBase as different versions of data structures and the current module was retaining old ones from 0.89-fb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652936#comment-14652936 ] Hudson commented on HBASE-13965: SUCCESS: Integrated in HBase-1.3-IT #68 (See [https://builds.apache.org/job/HBase-1.3-IT/68/]) HBASE-13965 Revert due to test failure in TestAssignmentManager (tedyu: rev 24dbe25e95d0a355b2e07aa94b5921ff4b4865e9) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652658#comment-14652658 ] Hadoop QA commented on HBASE-13965: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12748522/HBASE-13965-v11.patch against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133. ATTACHMENT ID: 12748522 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14964//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14964//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14964//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14964//console This message is automatically generated. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Fix Version/s: (was: 1.3.0) Test failure in TestAssignmentManager is reproducible. Reverted from branch-1 for now. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12865) WALs may be deleted before they are replicated to peers
[ https://issues.apache.org/jira/browse/HBASE-12865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652735#comment-14652735 ] Andrew Purtell commented on HBASE-12865: Sorry this has sat for a while. Handles KeeperExceptions better. The new unit test testFailoverDeadServerCversionChange verifies the ZK behavior we are expecting. Could go in as an improvement. Nice to have: a unit test that confirms we use the queues znode cversion correctly. WALs may be deleted before they are replicated to peers --- Key: HBASE-12865 URL: https://issues.apache.org/jira/browse/HBASE-12865 Project: HBase Issue Type: Bug Components: Replication Reporter: Liu Shaohui Assignee: He Liangliang Priority: Critical Attachments: HBASE-12865-V1.diff, HBASE-12865-V2.diff By design, ReplicationLogCleaner guarantee that the WALs being in replication queue can't been deleted by the HMaster. The ReplicationLogCleaner gets the WAL set from zookeeper by scanning the replication zk node. But it may get uncompleted WAL set during replication failover for the scan operation is not atomic. For example: There are three region servers: rs1, rs2, rs3, and peer id 10. The layout of replication zookeeper nodes is: {code} /hbase/replication/rs/rs1/10/wals /rs2/10/wals /rs3/10/wals {code} - t1: the ReplicationLogCleaner finished scanning the replication queue of rs1, and start to scan the queue of rs2. - t2: region server rs3 is down, and rs1 take over rs3's replication queue. The new layout is {code} /hbase/replication/rs/rs1/10/wals /rs1/10-rs3/wals /rs2/10/wals /rs3 {code} - t3, the ReplicationLogCleaner finished scanning the queue of rs2, and start to scan the node of rs3. But the the queue has been moved to replication/rs1/10-rs3/WALS So the ReplicationLogCleaner will miss the WALs of rs3 in peer 10 and the hmaster may delete these WALs before they are replicated to peer clusters. We encountered this problem in our cluster and I think it's a serious bug for replication. Suggestions are welcomed to fix this bug. thx~ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652947#comment-14652947 ] Hudson commented on HBASE-13965: SUCCESS: Integrated in HBase-1.3 #86 (See [https://builds.apache.org/job/HBase-1.3/86/]) HBASE-13965 Revert due to test failure in TestAssignmentManager (tedyu: rev 24dbe25e95d0a355b2e07aa94b5921ff4b4865e9) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13986) HMaster instance always returns false for isAborted() check.
[ https://issues.apache.org/jira/browse/HBASE-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13986: --- Status: Open (was: Patch Available) HMaster instance always returns false for isAborted() check. Key: HBASE-13986 URL: https://issues.apache.org/jira/browse/HBASE-13986 Project: HBase Issue Type: Bug Reporter: Abhishek Kumar Assignee: Abhishek Kumar Priority: Minor Attachments: HBASE-13986.patch It seems that HMaster never set abortRequested flag to true as done by HRegionServer in its abort() method.We can see isAborted method being used in few places for HMaster instance (like in HMasterCommandLine.startMaster) where code flow being determined based on the result of isAborted() call. We can set this abortRequested flag in Hmaster's abort() method as well like in HRegionServer's abort method, let me know if it seems ok. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652654#comment-14652654 ] Hudson commented on HBASE-13965: FAILURE: Integrated in HBase-1.3 #85 (See [https://builds.apache.org/job/HBase-1.3/85/]) HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev c215b900f49685989083df3786bd8441700c248a) * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-13965: Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13881) Bug in HTable#incrementColumnValue implementation
[ https://issues.apache.org/jira/browse/HBASE-13881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652710#comment-14652710 ] Nick Dimiduk commented on HBASE-13881: -- I believe this ticket warrants a release note. Bug in HTable#incrementColumnValue implementation - Key: HBASE-13881 URL: https://issues.apache.org/jira/browse/HBASE-13881 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.98.6.1, 1.0.1 Reporter: Jerry Lam Assignee: Gabor Liptak Fix For: 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13881.branch-1.1.patch The exact method I'm talking about is: {code} @Deprecated @Override public long incrementColumnValue(final byte [] row, final byte [] family, final byte [] qualifier, final long amount, final boolean writeToWAL) throws IOException { return incrementColumnValue(row, family, qualifier, amount, writeToWAL? Durability.SKIP_WAL: Durability.USE_DEFAULT); } {code} Setting writeToWAL to true, Durability will be set to SKIP_WAL which does not make much sense unless the meaning of SKIP_WAL is negated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Attachment: 13965-addendum.txt Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652731#comment-14652731 ] Ted Malaska commented on HBASE-14181: - Note a dataSource in Spark can have a lot of advanced functionality like Filter push down, Scan Range push down, and column filters. This Jira will try to get a base implementation down. But will leave room for more advanced functionality in additional jiras. Ted Malaska Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652796#comment-14652796 ] Jerry He commented on HBASE-14122: -- Looks good overall. We have AccessControlClient and VisibilityClient, which all go directly to the co-processor security endpoints. If calls are made while the security endpoints were not installed, they will get exception like: org.apache.hadoop.hbase.exceptions.UnknownProtocolException: No registered coprocessor service found for name ... On the shell side, the security commands (grant, revoke) will error out based on the non-existence of 'hbase:acl' table: DISABLED: Security features are not available The visibility command will error out based on the non-existence of 'hbase:labels' table: DISABLED: Visibility labels feature is not available Should all these be refactored to use the new master API for checking security support? Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14179) catch the same Exception twice
[ https://issues.apache.org/jira/browse/HBASE-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652797#comment-14652797 ] Gabor Liptak commented on HBASE-14179: -- The outer catch is for: ZKUtil.getData(this.watcher, nodePath); also throwing InterruptedException Both catch-es are needed. catch the same Exception twice -- Key: HBASE-14179 URL: https://issues.apache.org/jira/browse/HBASE-14179 Project: HBase Issue Type: Bug Affects Versions: 1.0.1, 1.1.0, 1.1.1, 1.1.0.1 Reporter: songwanging Priority: Minor In method markRegionsRecovering() of class: hbase-1.1.1\hbase-server\src\main\java\org\apache\hadoop\hbase\coordination\ZKSplitLogManagerCoordination.java InterruptedException is catched twice. public void markRegionsRecovering(final ServerName serverName, SetHRegionInfo userRegions) throws IOException, InterruptedIOException { ... try { Thread.sleep(20); } catch (InterruptedException e1) { throw new InterruptedIOException(); } } catch (InterruptedException e) { throw new InterruptedIOException(); } ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12923) ResultScanner is not closed in ModifyTableHandler#removeReplicaColumnsIfNeeded()
[ https://issues.apache.org/jira/browse/HBASE-12923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-12923: --- Status: Open (was: Patch Available) ResultScanner is not closed in ModifyTableHandler#removeReplicaColumnsIfNeeded() Key: HBASE-12923 URL: https://issues.apache.org/jira/browse/HBASE-12923 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Trivial Attachments: 12923-v1.txt In ModifyTableHandler#removeReplicaColumnsIfNeeded(): {code} ResultScanner resScanner = metaTable.getScanner(scan); for (Result result : resScanner) { {code} The ResultScanner is not closed upon exit from the method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652650#comment-14652650 ] Hudson commented on HBASE-13965: SUCCESS: Integrated in HBase-1.3-IT #67 (See [https://builds.apache.org/job/HBase-1.3-IT/67/]) HBASE-13965 Stochastic Load Balancer JMX Metrics (Lei Chen) (tedyu: rev c215b900f49685989083df3786bd8441700c248a) * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Attachment: (was: 13965-addendum.txt) Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652720#comment-14652720 ] Ted Yu commented on HBASE-13965: Addendum deals with the scenario where connector port has been taken on the test machine. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653069#comment-14653069 ] Jerry He commented on HBASE-14122: -- Another comment. You seem to be using 'UnsupportedOperationException' for backward compatibility, depending on it being thrown by the RPC facility if the method can not be located on the server side? Have not seen such example before in the HBase code. This is probably ok. Usually we explicitly construct the 'UnsupportedOperationException' at user code level? It works fine? The exception will be correctly propagated to the client? Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Attachment: HBASE-14178-0.98.patch upload patch for branch 0.98 regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Attachment: (was: HBASE-14178-0.98.patch) regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14122) Client API for determining if server side supports cell level security
[ https://issues.apache.org/jira/browse/HBASE-14122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653053#comment-14653053 ] Anoop Sam John commented on HBASE-14122: bq.Any further concerns Anoop Sam John ? I read your comment as a lgtm plus a question, so will proceed with commit tomorrow Sorry for not giving explicit +1 later. Yes that was a minor question and you addressed it already. I am +1. Ya will be nice to have as per Jerry's suggestion. Even with out that am +1 :-) Client API for determining if server side supports cell level security -- Key: HBASE-14122 URL: https://issues.apache.org/jira/browse/HBASE-14122 Project: HBase Issue Type: Improvement Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0, 1.3.0 Attachments: HBASE-14122-0.98.patch, HBASE-14122-branch-1.patch, HBASE-14122.patch, HBASE-14122.patch Add a client API for determining if the server side supports cell level security. Ask the master, assuming as we do in many other instances that the master and regionservers all have a consistent view of site configuration. Return {{true}} if all features required for cell level security are present, {{false}} otherwise, or throw {{UnsupportedOperationException}} if the master does not have support for the RPC call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14179) catch the same Exception twice
[ https://issues.apache.org/jira/browse/HBASE-14179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John resolved HBASE-14179. Resolution: Invalid catch the same Exception twice -- Key: HBASE-14179 URL: https://issues.apache.org/jira/browse/HBASE-14179 Project: HBase Issue Type: Bug Affects Versions: 1.0.1, 1.1.0, 1.1.1, 1.1.0.1 Reporter: songwanging Priority: Minor In method markRegionsRecovering() of class: hbase-1.1.1\hbase-server\src\main\java\org\apache\hadoop\hbase\coordination\ZKSplitLogManagerCoordination.java InterruptedException is catched twice. public void markRegionsRecovering(final ServerName serverName, SetHRegionInfo userRegions) throws IOException, InterruptedIOException { ... try { Thread.sleep(20); } catch (InterruptedException e1) { throw new InterruptedIOException(); } } catch (InterruptedException e) { throw new InterruptedIOException(); } ... } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Status: Open (was: Patch Available) regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Status: Patch Available (was: Open) regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13825) Get operations on large objects fail with protocol errors
[ https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-13825: --- Attachment: HBASE-13825-branch-1.patch HBASE-13825-0.98.patch HBASE-13825.patch I followed the reference to HBASE-14076 over to HBASE-13230. The solution there is to use the static helper ProtobufUtil#mergeDelimitedFrom wherever we've written a delimited message and would use mergeDelmitedFrom to read it back in, since the delimited message format begins with the total message size encoded in vint32. We use the encoded size to adjust the CodedInputStream limit as needed. Patches here also address relevant uses of builder#mergeFrom. We use Integer.MAX_VALUE as the size limit for CodedInputStream where it is not known. In some places it's unlikely a message processed there will exceed 64 MB, but I made a change anyway. It is harmless and consistent to use ProtobufUtil#mergeFrom. branch-1 and 0.98 patches also incorporate HBASE-14076. Reviewboard: https://reviews.apache.org/r/37062/ /cc [~stack] Touched a lot of your code here. Get operations on large objects fail with protocol errors - Key: HBASE-13825 URL: https://issues.apache.org/jira/browse/HBASE-13825 Project: HBase Issue Type: Bug Affects Versions: 1.0.0, 1.0.1 Reporter: Dev Lakhani Assignee: Andrew Purtell Fix For: 2.0.0, 0.98.14, 1.0.2, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13825-0.98.patch, HBASE-13825-branch-1.patch, HBASE-13825.patch When performing a get operation on a column family with more than 64MB of data, the operation fails with: Caused by: Portable(java.io.IOException): Call to host:port failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481) at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453) at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653) at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381) at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753) at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756) at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765) at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395) This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but that issue is related to cluster status. Scan and put operations on the same data work fine Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14653005#comment-14653005 ] Hudson commented on HBASE-13965: FAILURE: Integrated in HBase-TRUNK #6695 (See [https://builds.apache.org/job/HBase-TRUNK/6695/]) HBASE-13965 Addendum tries different connector ports if BindException is encountered (tedyu: rev 931e77d4507e1650c452cefadda450e0bf3f0528) * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0 Attachments: 13965-addendum.txt, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651691#comment-14651691 ] Anoop Sam John commented on HBASE-14178: This means when the current scan comes in which says cache read blocks as false, we wont even consult block for reading that? IMO that is wrong. When block is available, we should try read from that. Agree.. when the table CF is set to be not to cache data from that CF at all, there is no point in looking into the cache. So in cases of BC is disabled as well as the read CF is set to be cache block = false, no need to obtain lock at all. But the code change seems not just this much.. So IMO there can be 2 changes in code 1. Move the below piece of code inside the check if (cacheConf.isBlockCacheEnabled()) {code} if (!useLock) { // check cache again with lock useLock = true; continue; } {code} 2. The outer if check , (ie. if (cacheConf.isBlockCacheEnabled()) ) itself should be changed to include the check for the CF level cache setting. (HCD#setBlockCacheEnabled) regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Attachment: HBASE-14178_v3.patch Update patch, modify TestFromClientSide.testCacheOnWriteEvictOnClose: After compaction, we don't modify the expectedBlockHits regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651962#comment-14651962 ] Ted Malaska commented on HBASE-14150: - Initial tests are successful. I'm going to do some clean up and more tests and I will submit a patch soon. Add BulkLoad functionality to HBase-Spark Module Key: HBASE-14150 URL: https://issues.apache.org/jira/browse/HBASE-14150 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a given RDD. This will do the following: 1. figure out the number of regions and sort and partition the data correctly to be written out to HFiles 2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle stage and not in the memory of the reducer. This will allow this design to support super wide records with out going out of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651976#comment-14651976 ] Anoop Sam John commented on HBASE-14178: {code} public boolean shouldCacheBlockOnRead(BlockCategory category) { return isBlockCacheEnabled() (cacheDataOnRead || category == BlockCategory.INDEX || category == BlockCategory.BLOOM || (prefetchOnOpen (category != BlockCategory.META category != BlockCategory.UNKNOWN))); } {code} You can see the call to cacheConf.shouldCacheBlockOnRead(expectedBlockType.getCategory()) checks wrt whether the read request says the block to be cached after this read. It is not telling abt the CF level setting of whether data to be cached at all or not. We have to make that info available here in HFileReader. And when we read the index or meta blocks we have to consult BC.(which u do already) regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at
[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652247#comment-14652247 ] Lei Chen commented on HBASE-14082: -- Would it be simpler if we put the replica_id also in the Regions instead of creating a new MBean? The replica id can be queried using wildcard matching, without the need of searching in the name to replica_id map. e.g. {code} Regions: { namespace_default_table_foo_region_aaabbb_metric_mutateCount: 100, namespace_default_table_foo_region_aaabbb_metric_replicaid: 0, namespace_default_table_foo_region_bbbccc_metric_mutateCount: 100, namespace_default_table_foo_region_bbbccc_metric_replicaid: 1, } {code} Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652252#comment-14652252 ] Hadoop QA commented on HBASE-13965: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12748476/HBASE-13965-v10.patch against master branch at commit 4b6598e394bae67b54d6f741dd262afe03b2c133. ATTACHMENT ID: 12748476 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the master's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.sentry.tests.e2e.hive.hiveserver.AbstractHiveServer.createConnection(AbstractHiveServer.java:63) at org.apache.sentry.tests.e2e.hive.Context.createConnection(Context.java:92) at org.apache.sentry.tests.e2e.hive.AbstractTestWithStaticConfiguration.setupAdmin(AbstractTestWithStaticConfiguration.java:472) at org.apache.sentry.tests.e2e.dbprovider.TestDatabaseProvider.testGrantRevokeRoleToGroups(TestDatabaseProvider.java:2037) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14963//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/patchReleaseAuditWarnings.txt Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14963//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14963//console This message is automatically generated. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652314#comment-14652314 ] Lei Chen commented on HBASE-13965: -- thanks, I will update soon Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652270#comment-14652270 ] Ted Yu commented on HBASE-13965: @Lei : The following file triggered release audit warning: hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource Take a look at ./hbase-hadoop2-compat/src/test/resources/META-INF/services/org.apache.hadoop.hbase.HadoopShims to see how license is added. I will commit the next patch with the above fix. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14131) HBase Backup/Restore Phase 2: Describe backup image
[ https://issues.apache.org/jira/browse/HBASE-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14131. --- Resolution: Implemented See parent JIRA (HBASE-14123). HBase Backup/Restore Phase 2: Describe backup image --- Key: HBASE-14131 URL: https://issues.apache.org/jira/browse/HBASE-14131 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14132) HBase Backup/Restore Phase 2: History of backups
[ https://issues.apache.org/jira/browse/HBASE-14132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14132. --- Resolution: Implemented See parent JIRA (HBASE-14123). HBase Backup/Restore Phase 2: History of backups Key: HBASE-14132 URL: https://issues.apache.org/jira/browse/HBASE-14132 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14180) Change timeout - SocketTimeoutException because of callTimeout
Adrià V. created HBASE-14180: Summary: Change timeout - SocketTimeoutException because of callTimeout Key: HBASE-14180 URL: https://issues.apache.org/jira/browse/HBASE-14180 Project: HBase Issue Type: Bug Components: hbase, regionserver, rpc, Zookeeper Affects Versions: 1.1.1 Environment: Hadoop with Ambari 2.1.0 HBase 1.1.1.2.3 HDFS 2.7.1.2.3 Zookeeper 3.4.6.2.3 Phoenix 4.4.0.2.3 Reporter: Adrià V. HBase keeps throwing a timeout exception I have tryed every configuration I could think about to increase it. Partial stacktrace: {quote} Caused by: java.io.IOException: Call to hdp-w-1.c.dks-hadoop.internal/10.240.2.235:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1242) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1210) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:213) at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:369) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:343) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) ... 4 more Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=43, waitTime=60001, operationTimeout=6 expired. at org.apache.hadoop.hbase.ipc.Call.checkAndSetTimeout(Call.java:70) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1184) ... 13 more {quote} I've tryed editing config files and also setting config in Ambari with the next keys to increase the timeout with no success: - hbase.rpc.timeout - dfs.socket.timeout - dfs.client.socket-timeout - zookeeper.session.timeout Also the Phoenix properties, but I think it's mostly an HBase issue: - phoenix.query.timeoutMs - phoenix.query.keepAliveMs Full stack trace: {quote} Error: Encountered exception in sub plan [0] execution. (state=,code=0) java.sql.SQLException: Encountered exception in sub plan [0] execution. at org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:157) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:251) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:241) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:240) at org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1250) at sqlline.Commands.execute(Commands.java:822) at sqlline.Commands.sql(Commands.java:732) at sqlline.SqlLine.dispatch(SqlLine.java:808) at sqlline.SqlLine.begin(SqlLine.java:681) at sqlline.SqlLine.start(SqlLine.java:398) at sqlline.SqlLine.main(SqlLine.java:292) Caused by: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, exceptions: Mon Aug 03 16:47:06 UTC 2015, null, java.net.SocketTimeoutException: callTimeout=6, callDuration=60303: row '' on table 'hive_post_topics' at region=hive_post_topics,,1438084107396.cdbdc246ff0b7dfed31d481e0bccd2b5., hostname=hdp-w-1.c.dks-hadoop.internal,16020,1438619912282, seqNum=45322 at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:108) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:542) at org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176) at org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91) at org.apache.phoenix.join.HashCacheClient.serialize(HashCacheClient.java:106) at org.apache.phoenix.join.HashCacheClient.addHashCache(HashCacheClient.java:82) at org.apache.phoenix.execute.HashJoinPlan$HashSubPlan.execute(HashJoinPlan.java:339) at org.apache.phoenix.execute.HashJoinPlan$1.call(HashJoinPlan.java:136) at
[jira] [Commented] (HBASE-12890) Provide a way to throttle the number of regions moved by the balancer
[ https://issues.apache.org/jira/browse/HBASE-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652192#comment-14652192 ] Dave Latham commented on HBASE-12890: - Thanks, Ted and Andrew. Can it be committed? Provide a way to throttle the number of regions moved by the balancer - Key: HBASE-12890 URL: https://issues.apache.org/jira/browse/HBASE-12890 Project: HBase Issue Type: Improvement Affects Versions: 0.98.10 Reporter: churro morales Assignee: churro morales Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: HBASE-12890.patch We have a very large cluster and we frequently add remove quite a few regionservers from our cluster. Whenever we do this the balancer moves thousands of regions at once. Instead we provide a configuration parameter: hbase.balancer.max.regions. This limits the number of regions that are balanced per iteration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14133) HBase Backup/Restore Phase 2: Status (and progress) of backup request
[ https://issues.apache.org/jira/browse/HBASE-14133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652098#comment-14652098 ] Vladimir Rodionov commented on HBASE-14133: --- See parent JIRA (HBASE-14123). HBase Backup/Restore Phase 2: Status (and progress) of backup request - Key: HBASE-14133 URL: https://issues.apache.org/jira/browse/HBASE-14133 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14133) HBase Backup/Restore Phase 2: Status (and progress) of backup request
[ https://issues.apache.org/jira/browse/HBASE-14133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14133. --- Resolution: Implemented HBase Backup/Restore Phase 2: Status (and progress) of backup request - Key: HBASE-14133 URL: https://issues.apache.org/jira/browse/HBASE-14133 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14123) HBase Backup/Restore Phase 2
[ https://issues.apache.org/jira/browse/HBASE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14123: -- Attachment: HBASE-14123-v1.patch First patch, incorporates (HBASE-14125, HBASE-14130, HBASE-14131, HBASE-14132, HBASE-14133) HBase Backup/Restore Phase 2 Key: HBASE-14123 URL: https://issues.apache.org/jira/browse/HBASE-14123 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-14123-v1.patch Phase 2 umbrella JIRA. See HBASE-7912 for design document and description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14125) HBase Backup/Restore Phase 2: Cancel backup
[ https://issues.apache.org/jira/browse/HBASE-14125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14125. --- Resolution: Implemented See parent JIRA (HBASE-14123). HBase Backup/Restore Phase 2: Cancel backup --- Key: HBASE-14125 URL: https://issues.apache.org/jira/browse/HBASE-14125 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Cancel backup operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14130) HBase Backup/Restore Phase 2: Delete backup image
[ https://issues.apache.org/jira/browse/HBASE-14130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-14130. --- Resolution: Implemented See parent JIRA (HBASE-14123). HBase Backup/Restore Phase 2: Delete backup image - Key: HBASE-14130 URL: https://issues.apache.org/jira/browse/HBASE-14130 Project: HBase Issue Type: New Feature Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14178) regionserver blocks because of waiting for offsetLock
[ https://issues.apache.org/jira/browse/HBASE-14178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14178: -- Attachment: HBASE-14178_v4.patch regionserver blocks because of waiting for offsetLock - Key: HBASE-14178 URL: https://issues.apache.org/jira/browse/HBASE-14178 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.6 Reporter: Heng Chen Priority: Critical Fix For: 0.98.6 Attachments: HBASE-14178-0.98.patch, HBASE-14178.patch, HBASE-14178_v1.patch, HBASE-14178_v2.patch, HBASE-14178_v3.patch, HBASE-14178_v4.patch, jstack My regionserver blocks, and all client rpc timeout. I print the regionserver's jstack, it seems a lot of threads were blocked for waiting offsetLock, detail infomation belows: PS: my table's block cache is off {code} B.DefaultRpcServer.handler=2,queue=2,port=60020 #82 daemon prio=5 os_prio=0 tid=0x01827000 nid=0x2cdc in Object.wait() [0x7f3831b72000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:79) - locked 0x000773af7c18 (a org.apache.hadoop.hbase.util.IdLock$Entry) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:352) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:253) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:524) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:572) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:257) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:173) at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:313) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:269) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:695) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekAsDirection(StoreScanner.java:683) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:533) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:140) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:3889) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3969) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3847) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3820) - locked 0x0005e5c55ad0 (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3807) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4779) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4753) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2916) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29583) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0005e5c55c08 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14150) Add BulkLoad functionality to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14150: Attachment: HBASE-14150.1.patch First draft of BulkLoad with Spark. This patch includes: 1. HBaseContext Implementation 2. RDD Implicit Implementation 3. Unit Test Add BulkLoad functionality to HBase-Spark Module Key: HBASE-14150 URL: https://issues.apache.org/jira/browse/HBASE-14150 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Attachments: HBASE-14150.1.patch Add on to the work done in HBASE-13992 to add functionality to do a bulk load from a given RDD. This will do the following: 1. figure out the number of regions and sort and partition the data correctly to be written out to HFiles 2. Also unlike the MR bulkload I would like that the columns to be sorted in the shuffle stage and not in the memory of the reducer. This will allow this design to support super wide records with out going out of memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Chen updated HBASE-13965: - Attachment: HBASE-13965-v11.patch Updates: 1. License added for {{hbase-hadoop2-compat/src/main/resources/x/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource}} Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13965: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks for the patch, Lei. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)