[jira] [Updated] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HBASE-7568: Attachment: (was: HBASE-7568.trunkv1) [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HBASE-7568: Attachment: HBASE-7568-trunk-v1.patch Renaming patch file so hadoopqa will pick it up. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7803: --- Attachment: trunk-7803.patch Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7803: --- Status: Patch Available (was: Open) Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605520#comment-13605520 ] Jimmy Xiang commented on HBASE-7803: Attached a patch to make REST support caching. Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8097) MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing
[ https://issues.apache.org/jira/browse/HBASE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8097: -- Fix Version/s: (was: 0.96.0) 0.98.0 0.95.0 Hadoop Flags: Reviewed Integrated to 0.95 and trunk. Thanks for the patch, Jeffrey. Thanks for the reviews, Jimmy, Nicolas and Chunhui. MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing -- Key: HBASE-8097 URL: https://issues.apache.org/jira/browse/HBASE-8097 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.98.0 Attachments: 8097.txt, hbase-8097_1.patch, hbase-8097_v2.patch, hbase-8097_v3.patch {code} } catch (IOException ioe) { this.services.getExecutorService().submit(this); this.deadServers.add(serverName); throw new IOException(failed log splitting for + serverName + , will retry, ioe); } {code} this.deadServers.add(serverName); will keep incrementing DeadServer.numProcessing We can't get rid of numProcessing by just checking deadServers.size() because deadServers is also used to report some historically failed RSs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8128) HTable#put improvements
[ https://issues.apache.org/jira/browse/HBASE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605527#comment-13605527 ] nkeywal commented on HBASE-8128: Committed in 0.94 HTable#put improvements --- Key: HBASE-8128 URL: https://issues.apache.org/jira/browse/HBASE-8128 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.95.0, 0.96.0 Attachments: 8128.v1.patch 3 points: - When doing a single put, we're creating an object by calling Arrays.asList - we're doing a size check every 10 put. Not doing it seems simpler, better and allows to share some code between a single put and a list of puts. - we could call flushCommits on empty write buffer, especially for someone using a lot of HTable instead of using a pool, as it's called in close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605528#comment-13605528 ] Jimmy Xiang commented on HBASE-7803: I did some testing on my 4 nodes cluster with ycsb and here is the scan throughput I got with REST API: With caching, and using batch: 8.83 With caching, but no batch: 0.99 No caching, but using batch: 1.85 No caching, no batch: 0.68 On the same cluster, using the HBase client java API, the throughput I got is: 29.04 Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8128) HTable#put improvements
[ https://issues.apache.org/jira/browse/HBASE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-8128: --- Fix Version/s: 0.94.8 HTable#put improvements --- Key: HBASE-8128 URL: https://issues.apache.org/jira/browse/HBASE-8128 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.95.0, 0.96.0, 0.94.8 Attachments: 8128.v1.patch 3 points: - When doing a single put, we're creating an object by calling Arrays.asList - we're doing a size check every 10 put. Not doing it seems simpler, better and allows to share some code between a single put and a list of puts. - we could call flushCommits on empty write buffer, especially for someone using a lot of HTable instead of using a pool, as it's called in close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605535#comment-13605535 ] Jimmy Xiang commented on HBASE-7803: Batch means less REST HTTP trips. Caching means less trips to region servers. Based on the results, it seems both are performance killers, and HTTP overhead has more impact. Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-7597) testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reopened HBASE-7597: Assignee: Jimmy Xiang It failed again: https://builds.apache.org/job/HBase-0.95/82/ Let me reopen it and take a look if I can do something about it. testRegionShouldNotBeDeployed seems to be flaky --- Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8097) MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing
[ https://issues.apache.org/jira/browse/HBASE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8097: -- Resolution: Fixed Status: Resolved (was: Patch Available) MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing -- Key: HBASE-8097 URL: https://issues.apache.org/jira/browse/HBASE-8097 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.98.0 Attachments: 8097.txt, hbase-8097_1.patch, hbase-8097_v2.patch, hbase-8097_v3.patch {code} } catch (IOException ioe) { this.services.getExecutorService().submit(this); this.deadServers.add(serverName); throw new IOException(failed log splitting for + serverName + , will retry, ioe); } {code} this.deadServers.add(serverName); will keep incrementing DeadServer.numProcessing We can't get rid of numProcessing by just checking deadServers.size() because deadServers is also used to report some historically failed RSs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7597: -- Summary: TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky (was: testRegionShouldNotBeDeployed seems to be flaky) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605554#comment-13605554 ] Ted Yu commented on HBASE-7568: --- From https://builds.apache.org/job/PreCommit-HBASE-Build/4871/console, it looks like the patch doesn't compile (against hadoop 2.0, at least) [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8128) HTable#put improvements
[ https://issues.apache.org/jira/browse/HBASE-8128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8128: -- Fix Version/s: (was: 0.94.8) 0.94.7 HTable#put improvements --- Key: HBASE-8128 URL: https://issues.apache.org/jira/browse/HBASE-8128 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Trivial Fix For: 0.95.0, 0.96.0, 0.94.7 Attachments: 8128.v1.patch 3 points: - When doing a single put, we're creating an object by calling Arrays.asList - we're doing a size check every 10 put. Not doing it seems simpler, better and allows to share some code between a single put and a list of puts. - we could call flushCommits on empty write buffer, especially for someone using a lot of HTable instead of using a pool, as it's called in close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605572#comment-13605572 ] Sergey Shelukhin commented on HBASE-7679: - bq. ConcatenatedLists should have unit test. Added. bq. Should this define be in the Interface or do you think it implementation specific? bq. + public static final String BLOCKING_STOREFILES_KEY = hbase.hstore.blockingStoreFiles; Not certain, having things in HStore seems to be the convention. Store isn't a real interface that invites different implementation :) bq. On StripeStoreFileManager, do we know if this approach has merit? Have we run models or actual test runs and can see it saves i/o? Would be interesting to know. Do we have to commit it to figure this out? I can see committing all the refactorings which allow different compaction policies but would think a compaction engine would need to have proven merit before it goes in? What you think Sergey? I have an integration test in HBASE-8000, but have only run it for correctness now. I plan to make a bigger test for perf, and move to commit after having some numbers. bq. Do we have to have a L0? Can we not flush multiple files when we flush, one per boundary in the region? Was that thought just too much work flushing? I was concerned about many small files, and scope creep into memstore, as discussed. Let me do a write-up on this (probably useful anyway), and discuss on dev list. After integration tests on tiny files (not a target scenario for this, but still), I wonder if impact of L0 files on # of files to be read for gets is indeed worth it. On the other hand for scans, and for overall situation large number of small files is not good. implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7679: Attachment: HBASE-7679-v10.patch updated patch implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8127) Region of a disabling or disabled table could be stuck in transition state when RS dies during Master initialization
[ https://issues.apache.org/jira/browse/HBASE-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605582#comment-13605582 ] rajeshbabu commented on HBASE-8127: --- bq. One time when I saw the opening RIT stuck is due to the offlineDisabledRegion function in assignment manager. As you can see we don't handle opening RIT inside the function. If I am not wrong HBASE-7824 patch applied at that time right? One problem I am suspecting with HBASE-7824 patch is {code} +if (preMetaServer != null failedServers.contains(preMetaServer)) { + // create recovered edits file for .META. server + this.fileSystemManager.splitLog(preMetaServer); + failedServers.remove(preMetaServer); +} {code} If a RS carrying ROOT or META went down,we are not calling SSH for that RS(not even adding to deadservers). We are handling regions in transitions to the dead server by processRegionsInTransitions which can cause RIT stuck in case OPENING state. If znode in RS_ZK_REGION_OPENING state then we will just add to RIT and wait for TM to handle. {code} regionsInTransition.put(encodedRegionName, new RegionState(regionInfo, RegionState.State.OPENING, data.getStamp(), data.getOrigin())); failoverProcessedRegions.put(encodedRegionName, regionInfo); {code} When ever TM handles we we will assign,in that case RIT can stuck because its seeing table in DISABLING/DISABLED. If really the RS is ALIVE this case wont happen because after assignment unassign will be called. for HBASE-7824 patch we can do below change which avoids RIT stuck like in opening state. If meta RS is down before/during master restart we can add it to deadservers and start SSH by passing shouldSplitHlog as false because already splitted logs. {code} this.deadservers.add(serverName); this.services.getExecutorService().submit( new ServerShutdownHandler(this.master, this.services, this.deadservers, serverName, false)); {code} Any way actual problem you have given in description we can handle in SSH side. I am working on it. One more thing about your feedback patch: {code} +// delete RITs if exists in any state of disabling or disabled tables during master starts +// up +if (!hri.isMetaTable()) { + String tableName = hri.getTableNameAsString(); + boolean disabled = this.zkTable.isDisabledTable(tableName); + if (disabled || this.zkTable.isDisablingTable(tableName)) { +ZKAssign.deleteNodeFailSilent(watcher, hri); +regionOffline(hri); +continue; + } +} {code} We dont know whether the DISABLING table region is already closed or not on RS, so we should not offline region directly. In SSH we can do because the RS is went down. Region of a disabling or disabled table could be stuck in transition state when RS dies during Master initialization Key: HBASE-8127 URL: https://issues.apache.org/jira/browse/HBASE-8127 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.7 Attachments: HBASE-8127_feedback.patch, HBASE-8127.patch, hbase-8127_v1.patch, reproduce-hang.patch The issue happens when a RS dies during a master starts up. After the RS reports open to the new master instance and dies immediately thereafter, the RITs of disabling tables(or disabled table) on the died RS will be in RIT state forever. I attached a patch to simulate the situation and you can run the following command to reproduce the issue: {code}mvn test -PlocalTests -Dtest=TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS{code} Basically, we skip regions of a dead server inside AM.processDeadServersAndRecoverLostRegions as the following code and relies on SSH to process those skipped regions: {code} for (PairHRegionInfo, Result deadRegion : deadServer.getValue()) { nodes.remove(deadRegion.getFirst().getEncodedName()); } {code} While in SSH, we skip regions of disabling(or disabled table) again by function processDeadRegion. Finally comes to the issue that RITs of disabling(or disabled table) stuck there forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605583#comment-13605583 ] Sergey Shelukhin commented on HBASE-7055: - Ping? port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes) -- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.0 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, HBASE-6371-v3-refactor-only-squashed.patch, HBASE-6371-v4-refactor-only-squashed.patch, HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, HBASE-7055-v7.patch, HBASE-7055-v7.patch See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605586#comment-13605586 ] Ted Yu commented on HBASE-7295: --- I ran TestRowProcessorEndpoint with trunk patch v4 and it passed. +1 Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7992) provide pre/post region offline hooks for HMaster.offlineRegion()
[ https://issues.apache.org/jira/browse/HBASE-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605590#comment-13605590 ] Hudson commented on HBASE-7992: --- Integrated in HBase-TRUNK #3969 (See [https://builds.apache.org/job/HBase-TRUNK/3969/]) HBASE-7992 provide pre/post region offline hooks for HMaster.offlineRegion() (Rajeshbabu) (Revision 1457854) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/BaseMasterObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java provide pre/post region offline hooks for HMaster.offlineRegion() - Key: HBASE-7992 URL: https://issues.apache.org/jira/browse/HBASE-7992 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.95.0 Reporter: rajeshbabu Assignee: rajeshbabu Fix For: 0.98.0 Attachments: 7992_trunk_3.patch, HBASE-7992_trunk_2.patch, HBASE-7992_trunk.patch presently no hooks to provide access control to offline region in master. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605592#comment-13605592 ] stack commented on HBASE-7679: -- Agree lets get numbers before saying L0 is bad. Ditto get numbers before commit and yes a write up would be helpful. Smarter compaction could make for big wins all around Sergey. Thanks for persisting. implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7481) throw IOExceptions from Filter methods?
[ https://issues.apache.org/jira/browse/HBASE-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605593#comment-13605593 ] Hadoop QA commented on HBASE-7481: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574069/HBASE-7481-1.0.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4869//console This message is automatically generated. throw IOExceptions from Filter methods? --- Key: HBASE-7481 URL: https://issues.apache.org/jira/browse/HBASE-7481 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Fix For: 0.95.0, 0.98.0 Attachments: HBASE-7481-1.0.txt Currently there is no way to throw custom IOExceptions from any of the filter methods. For implementers of custom filters that presents a problem. For example there are scenarios where the filter would want to indicate to the client that there it should not retry. Currently there is no way of doing that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8067) TestHFileArchiving.testArchiveOnTableDelete sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605596#comment-13605596 ] Ted Yu commented on HBASE-8067: --- Looks like this test failed again in trunk build #3969 TestHFileArchiving.testArchiveOnTableDelete sometimes fails --- Key: HBASE-8067 URL: https://issues.apache.org/jira/browse/HBASE-8067 Project: HBase Issue Type: Bug Components: Admin, master, test Affects Versions: 0.96.0, 0.94.6 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.95.0, 0.94.7 Attachments: HBASE-8067-debug.patch, HBASE-8067-v0.patch it seems that testArchiveOnTableDelete() fails because the archiving in DeleteTableHandler is still in progress when admin.deleteTable() returns. {code} Error Message Archived files are missing some of the store files! Stacktrace java.lang.AssertionError: Archived files are missing some of the store files! at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.backup.TestHFileArchiving.testArchiveOnTableDelete(TestHFileArchiving.java:262) {code} (Looking at the problem in a more generic way, we don't have any way to inform the client when an async operation is completed) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605603#comment-13605603 ] Chris Trezzo commented on HBASE-7568: - Hmm, it compiled locally. Will investigate. Thanks Ted. Chris [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7803) Look into REST API performance
[ https://issues.apache.org/jira/browse/HBASE-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605605#comment-13605605 ] Hadoop QA commented on HBASE-7803: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574197/trunk-7803.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4870//console This message is automatically generated. Look into REST API performance -- Key: HBASE-7803 URL: https://issues.apache.org/jira/browse/HBASE-7803 Project: HBase Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: trunk-7803.patch I have a YCSB client using the REST API. My testing shows the performance for scan with REST API is much worse than that with the java client API. We need to look into it and find out the root cause, either the test issue, or our REST API issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605609#comment-13605609 ] Chris Trezzo commented on HBASE-7568: - Woops. Posted the wrong file when I renamed it. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HBASE-7568: Attachment: (was: HBASE-7568-trunk-v1.patch) [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8127) Region of a disabling or disabled table could be stuck in transition state when RS dies during Master initialization
[ https://issues.apache.org/jira/browse/HBASE-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605616#comment-13605616 ] Jeffrey Zhong commented on HBASE-8127: -- [~rajesh23] Thanks for the detailed comments. {quote} If I am not wrong HBASE-7824 patch applied at that time right? {quote} No. Actually with the {code}failedServers.remove(preMetaServer);{code} we don't see any issue at all. The only problem is when we have non-empty dead severs which are simulated by the reproduce-hang patch Anyway, the opening RIT of disabled table which causing issues is on the live RS not the one dies(or aborted) in the test. So the changes in SSH should not have any impact IMHO. Region of a disabling or disabled table could be stuck in transition state when RS dies during Master initialization Key: HBASE-8127 URL: https://issues.apache.org/jira/browse/HBASE-8127 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.94.7 Attachments: HBASE-8127_feedback.patch, HBASE-8127.patch, hbase-8127_v1.patch, reproduce-hang.patch The issue happens when a RS dies during a master starts up. After the RS reports open to the new master instance and dies immediately thereafter, the RITs of disabling tables(or disabled table) on the died RS will be in RIT state forever. I attached a patch to simulate the situation and you can run the following command to reproduce the issue: {code}mvn test -PlocalTests -Dtest=TestMasterFailover#testMasterFailoverWithMockedRITOnDeadRS{code} Basically, we skip regions of a dead server inside AM.processDeadServersAndRecoverLostRegions as the following code and relies on SSH to process those skipped regions: {code} for (PairHRegionInfo, Result deadRegion : deadServer.getValue()) { nodes.remove(deadRegion.getFirst().getEncodedName()); } {code} While in SSH, we skip regions of disabling(or disabled table) again by function processDeadRegion. Finally comes to the issue that RITs of disabling(or disabled table) stuck there forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605621#comment-13605621 ] Lars Hofhansl commented on HBASE-7295: -- I know we went through this before, but just making the PoolMap volatile does not make the implementation thread safe. Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6915) String and ConcurrentHashMap sizes change on jdk7; makes TestHeapSize fail
[ https://issues.apache.org/jira/browse/HBASE-6915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605631#comment-13605631 ] Lars Hofhansl commented on HBASE-6915: -- +1 for 0.94 as well. String and ConcurrentHashMap sizes change on jdk7; makes TestHeapSize fail -- Key: HBASE-6915 URL: https://issues.apache.org/jira/browse/HBASE-6915 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: jdk7.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8014) Backport HBASE-6915 to 0.94.
[ https://issues.apache.org/jira/browse/HBASE-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605633#comment-13605633 ] Ted Yu commented on HBASE-8014: --- Here is Lars' confirmation: https://issues.apache.org/jira/browse/HBASE-6915?focusedCommentId=13605631page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13605631 Backport HBASE-6915 to 0.94. Key: HBASE-8014 URL: https://issues.apache.org/jira/browse/HBASE-8014 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Critical Attachments: HBASE-8014-v0-0.94.patch JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 to 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8141) Remove accidental uses of org.mortbay.log
Andrew Purtell created HBASE-8141: - Summary: Remove accidental uses of org.mortbay.log Key: HBASE-8141 URL: https://issues.apache.org/jira/browse/HBASE-8141 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.6 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Remove accidental uses of org.mortbay.log.Log. Eclipse autocomplete is probably the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8014) Backport HBASE-6915 to 0.94.
[ https://issues.apache.org/jira/browse/HBASE-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605639#comment-13605639 ] Ted Yu commented on HBASE-8014: --- Integrated to 0.94 Thanks for the patch, Jean-Marc. Thanks for the confirmation, Lars. Backport HBASE-6915 to 0.94. Key: HBASE-8014 URL: https://issues.apache.org/jira/browse/HBASE-8014 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Critical Attachments: HBASE-8014-v0-0.94.patch JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 to 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8014) Backport HBASE-6915 to 0.94.
[ https://issues.apache.org/jira/browse/HBASE-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8014: -- Fix Version/s: 0.94.7 Backport HBASE-6915 to 0.94. Key: HBASE-8014 URL: https://issues.apache.org/jira/browse/HBASE-8014 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jean-Marc Spaggiari Priority: Critical Fix For: 0.94.7 Attachments: HBASE-8014-v0-0.94.patch JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 to 0.94 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8141) Remove accidental uses of org.mortbay.log.Log
[ https://issues.apache.org/jira/browse/HBASE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-8141: -- Summary: Remove accidental uses of org.mortbay.log.Log (was: Remove accidental uses of org.mortbay.log) Remove accidental uses of org.mortbay.log.Log - Key: HBASE-8141 URL: https://issues.apache.org/jira/browse/HBASE-8141 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.6 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Remove accidental uses of org.mortbay.log.Log. Eclipse autocomplete is probably the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605644#comment-13605644 ] Ted Yu commented on HBASE-7055: --- I am going over the patch. Can you update Release Notes ? There're a lot of config parameters introduced in this patch. port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes) -- Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.95.0 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, HBASE-6371-v3-refactor-only-squashed.patch, HBASE-6371-v4-refactor-only-squashed.patch, HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, HBASE-7055-v7.patch, HBASE-7055-v7.patch See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8141) Remove accidental uses of org.mortbay.log.Log
[ https://issues.apache.org/jira/browse/HBASE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell resolved HBASE-8141. --- Resolution: Fixed Fix Version/s: 0.94.6 0.96.0 0.95.0 Remove accidental uses of org.mortbay.log.Log - Key: HBASE-8141 URL: https://issues.apache.org/jira/browse/HBASE-8141 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.6 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.95.0, 0.96.0, 0.94.6 Attachments: 8141-0.94.patch, 8141-trunk.patch Remove accidental uses of org.mortbay.log.Log. Eclipse autocomplete is probably the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8141) Remove accidental uses of org.mortbay.log.Log
[ https://issues.apache.org/jira/browse/HBASE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-8141: -- Attachment: 8141-0.94.patch 8141-trunk.patch Trivial patches committed. Remove accidental uses of org.mortbay.log.Log - Key: HBASE-8141 URL: https://issues.apache.org/jira/browse/HBASE-8141 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.6 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.95.0, 0.96.0, 0.94.6 Attachments: 8141-0.94.patch, 8141-trunk.patch Remove accidental uses of org.mortbay.log.Log. Eclipse autocomplete is probably the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605664#comment-13605664 ] Varun Sharma commented on HBASE-7295: - Lars, I maybe forgetting but is it because of the edge cases with PoolMap thread safety or is it the Connection object thread safety or is it because of the double checked locking issue in general ? Thanks Varun Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7679) implement store file management for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605678#comment-13605678 ] Hadoop QA commented on HBASE-7679: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574205/HBASE-7679-v10.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4872//console This message is automatically generated. implement store file management for stripe compactions -- Key: HBASE-7679 URL: https://issues.apache.org/jira/browse/HBASE-7679 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v0-incomplete.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-and-7603-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v1.patch, HBASE-7667-v2.patch, HBASE-7667-v2.patch, HBASE-7667-v3.patch, HBASE-7679-v10.patch, HBASE-7679-v4.patch, HBASE-7679-v5.patch, HBASE-7679-v6.patch, HBASE-7679-v7-.patch, HBASE-7679-v7.patch, HBASE-7679-v8.patch, HBASE-7679-v9.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8108) Add m2eclispe lifecycle mapping to hbase-common
[ https://issues.apache.org/jira/browse/HBASE-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates updated HBASE-8108: --- Summary: Add m2eclispe lifecycle mapping to hbase-common (was: Add m2eclispe lifecycle mapping to hbase-commn) Add m2eclispe lifecycle mapping to hbase-common --- Key: HBASE-8108 URL: https://issues.apache.org/jira/browse/HBASE-8108 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.95.0, 0.98.0 Reporter: Jesse Yates Assignee: Jesse Yates Attachments: hbase-8108.patch, hbase-8108-v2.patch The maven-antrun-plugin execution doesn't have a default mapping in m2eclipse, so if you import the project into eclipse, you will get an error that the mapping is undefined. All that's needed is to define an execution via the org.eclipse.m2 lifecycle-mapping plugin - it doesn't actually affect the usual maven build at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8108) Add m2eclispe lifecycle mapping to hbase-common
[ https://issues.apache.org/jira/browse/HBASE-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesse Yates resolved HBASE-8108. Resolution: Fixed Fix Version/s: 0.98.0 0.95.0 committed to trunk and 0.95. Thanks for the reviews! Add m2eclispe lifecycle mapping to hbase-common --- Key: HBASE-8108 URL: https://issues.apache.org/jira/browse/HBASE-8108 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.95.0, 0.98.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.95.0, 0.98.0 Attachments: hbase-8108.patch, hbase-8108-v2.patch The maven-antrun-plugin execution doesn't have a default mapping in m2eclipse, so if you import the project into eclipse, you will get an error that the mapping is undefined. All that's needed is to define an execution via the org.eclipse.m2 lifecycle-mapping plugin - it doesn't actually affect the usual maven build at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-7590: --- Status: Open (was: Patch Available) Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7590.inprogress.patch, 7590.v12.patch, 7590.v12.patch, 7590.v13.patch, 7590.v1.patch, 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch, 7590.v5.patch, 7590.v5.patch t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-7590: --- Attachment: 7590.v13.patch Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7590.inprogress.patch, 7590.v12.patch, 7590.v12.patch, 7590.v13.patch, 7590.v1.patch, 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch, 7590.v5.patch, 7590.v5.patch t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605698#comment-13605698 ] nkeywal commented on HBASE-7590: May be 13 is going to be my lucky number :-) ? Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7590.inprogress.patch, 7590.v12.patch, 7590.v12.patch, 7590.v13.patch, 7590.v1.patch, 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch, 7590.v5.patch, 7590.v5.patch t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nkeywal updated HBASE-7590: --- Status: Patch Available (was: Open) Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7590.inprogress.patch, 7590.v12.patch, 7590.v12.patch, 7590.v13.patch, 7590.v1.patch, 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch, 7590.v5.patch, 7590.v5.patch t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7965) Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933)
[ https://issues.apache.org/jira/browse/HBASE-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605707#comment-13605707 ] Jonathan Hsieh commented on HBASE-7965: --- I think it is unfair to claim that the ability to change schema without disabling the table is a feature that is required to for HBase to be production ready. The feature is it is off by default, essentially documented as experimental ({{Its off by default. Enable it at your own risk.}} [1]), so in my eyes fixing it essentially feels like a new feature. [1]http://hbase.apache.org/book.html#d1949e2910 . (Sorry for the delay, was away for a 2 weeks). Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933) --- Key: HBASE-7965 URL: https://issues.apache.org/jira/browse/HBASE-7965 Project: HBase Issue Type: New Feature Components: master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.7 Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933). This is a new feature, but there has been some interest, and it is necessary for snapshots, and online merge, which are also candidates for backport. If we port snapshots, we might need HBASE-7848 as well. We can also do disabled by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605711#comment-13605711 ] Lars Hofhansl commented on HBASE-7295: -- Double checked locking is fine when the variable checked in declared volatile (i.e. ensure proper read/write memory barriers). Here PoolMap itself would have to be thread-safe, which - as far as I know - it is not. Also in the uncontended case an access to a volatile is not significantly cheaper than a synchronized statement, so I doubt that even if it was correct it would actually improve the situation ... Unless you see extremely high contention on this lock. Do you have sample code that can reproduce the problem? Until then I'm -1 on this change. (sorry) Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7597: --- Attachment: trunk-7597.patch TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7597: --- Status: Patch Available (was: Reopened) In the log file, it shows the region is not deployed according to hbck although it is. I added some checking (the same way as in hbck) to make sure the region is deployed before running hbck. TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8135) Mutation should implement HeapSize
[ https://issues.apache.org/jira/browse/HBASE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8135: -- Attachment: 8135-v3.txt Patch v3 makes TestHeapSize pass. Put has already been covered in TestHeapSize#testSizes() Mutation should implement HeapSize -- Key: HBASE-8135 URL: https://issues.apache.org/jira/browse/HBASE-8135 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0, 0.96.0, 0.94.5 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.95.0, 0.96.0 Attachments: 8135.v1.patch, 8135.v2.patch, 8135-v3.txt Code is there already. Doing so would allow to share some code when doing client side buffering. patch compiles locally, should not impact tests, but not tested locally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7965) Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933)
[ https://issues.apache.org/jira/browse/HBASE-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605718#comment-13605718 ] Lars Hofhansl commented on HBASE-7965: -- Welcome back Jon :) I do not think it is question about fair vs. unfair. It is a fact that you cannot safely do online schema changes in 0.94. When we have an actual patch against 0.94 we can weigh that deficiency against the risk introduced by the patch. Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933) --- Key: HBASE-7965 URL: https://issues.apache.org/jira/browse/HBASE-7965 Project: HBase Issue Type: New Feature Components: master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.94.7 Port table locking to 0.94 (HBASE-7305, HBASE-7546, HBASE-7933). This is a new feature, but there has been some interest, and it is necessary for snapshots, and online merge, which are also candidates for backport. If we port snapshots, we might need HBASE-7848 as well. We can also do disabled by default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8135) Mutation should implement HeapSize
[ https://issues.apache.org/jira/browse/HBASE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8135: -- Status: Patch Available (was: Open) Mutation should implement HeapSize -- Key: HBASE-8135 URL: https://issues.apache.org/jira/browse/HBASE-8135 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.94.5, 0.95.0, 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.95.0, 0.96.0 Attachments: 8135.v1.patch, 8135.v2.patch, 8135-v3.txt Code is there already. Doing so would allow to share some code when doing client side buffering. patch compiles locally, should not impact tests, but not tested locally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HBASE-7568: Attachment: HBASE-7568-trunk-v1.patch Attached re-based patch to incorporate new test in TestReplicationSourceManager. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605729#comment-13605729 ] stack commented on HBASE-7295: -- This doesn't make sense: {code} - protected final PoolMapConnectionId, Connection connections; + protected volatile PoolMapConnectionId, Connection connections; {code} This is http://en.wikipedia.org/wiki/Double-checked_locking No weird errors/connection fails in your thrift gateway? PoolMap looks like it is backed by a concurrent hash map which would be fine on the gets, etc., but the iterations are not synchronized (I don't see connections being iterated but they probably are someplace if I looked more). We committed a double-check locking around block cache a while ago: https://issues.apache.org/jira/secure/attachment/12553266/5898-v4.txt Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605732#comment-13605732 ] stack commented on HBASE-7597: -- +1 IMO, exploratory/debug is fine to commit trying to figure whats up on jenkins (since it hard to reproduce its context elsewhere). TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7905: - Status: Open (was: Patch Available) Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7905: - Attachment: testipc_for_pre_cellblocks.txt Add main to testipc for current trunk, before this patch goes in. Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations
[ https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605746#comment-13605746 ] Jonathan Hsieh commented on HBASE-7305: --- The doc is great -- I'm really the most curious about why different operations get the read or the write aspects of the lock guard what they protect. I'm trying to justify this to myself now based on the docs. So, do I have this right? Affected operations: * create, delete, disable, enable, alter, modify table (add/del/mod col, mod table), splits * Other candidates: merge, snapshot, ... balancer, am, ssh, hbck current rationale: * want to allow safe table mods (disable, enable, alter) * want to allow concurrent splits * want snapshots operations to be safe Implementaiton: * Read locks on splits. * Exclusive write lock on all other table mods. Questions/Observations: * This primarily protects operations that clash with table level enable/disable/alter, but not region level operations, right?. * This doesn't guard meta from individual changes, right? It only protects meta from bulk adds (create/delete table). Thus this shouldn't affect region moves or region closes/opens. * Protecting split with a read table lock only prevents alter/enable/disable table ops from happening. If an overlapping merge and split were issued, some other mechanism is in place to keep this sane right? This doesn't protect multiple merge requests with overlapping regions right? * Merges will likely want the read lock? (allowing multiple concurrent merges, and assuming some overlap sanity protection from a different mechanism). * With snapshots, this mechanism doesn't prevent regions from moving so it only protects snapshots from concurrently happening with enable/disable/alter table ops. Snapshot will still fail if it gets caught while the balancer is running. * These locks don't really help hbck -- except for the cases where enable/disable/alter operations are going on as hbck repairs things. (It wouldn't protect hbck from the balancer). As a strawman (for follow on work), I'm thinking for Assignemnt dependent operations (splits/balancer/ssh/snapshots/merge) we might want another lock (I believe regions-in-transition kind of serve this purpose already). * Does having a table lock (and then having individual region locks that require a table read lock being held) make sense? Maybe this makes sense for merges and splits? ZK based Read/Write locks for table operations -- Key: HBASE-7305 URL: https://issues.apache.org/jira/browse/HBASE-7305 Project: HBase Issue Type: Bug Components: Client, master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.0 Attachments: 130228-zkrwlocks.pdf, 7305-v11.txt, hbase-7305_v0.patch, hbase-7305_v10.patch, hbase-7305_v13.patch, hbase-7305_v14.patch, hbase-7305_v15.patch, hbase-7305_v1-based-on-curator.patch, hbase-7305_v2.patch, hbase-7305_v4.patch, hbase-7305_v9.patch, HBaseTableLocks.pdf This has started as forward porting of HBASE-5494 and HBASE-5991 from the 89-fb branch to trunk, but diverged enough to have it's own issue. The idea is to implement a zk based read/write lock per table. Master initiated operations should get the write lock, and region operations (region split, moving, balance?, etc) acquire a shared read lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8081) Backport HBASE-7213 (separate hlog for meta tables) to 0.94
[ https://issues.apache.org/jira/browse/HBASE-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated HBASE-8081: --- Attachment: 7213-0.94-with-config-1.patch This is essentially the same patch as the last one with one minor change - added the new config in hbase-default.xml. Passes the unit tests with the option enabled. Also ran manual tests on a cluster with the config on/off. Things looked good. Backport HBASE-7213 (separate hlog for meta tables) to 0.94 --- Key: HBASE-8081 URL: https://issues.apache.org/jira/browse/HBASE-8081 Project: HBase Issue Type: Bug Affects Versions: 0.94.5 Reporter: Devaraj Das Assignee: Devaraj Das Fix For: 0.94.7 Attachments: 7213-0.94-2.patch, 7213-0.94-3.patch, 7213-0.94.patch, 7213-0.94-with-config-1.patch, 7213-0.94-with-config.patch I am interested in backporting HBASE-7213 to 0.94. Helps to address more of the MTTR story. Offline discussion with Lars indicated he is interested as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations
[ https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605761#comment-13605761 ] Sergey Shelukhin commented on HBASE-7305: - bq. having individual region locks that require a table read lock being held I wonder if region lock approach would scale. Though vary I can accept that splits are infrequent enough to not introduce too much delay to table operations, but if every AM action blocks every table operation I think it will not scale beyond small or medium clusters. I think we should be able to use better approach... table updates on modified regions can be done after modification. ZK based Read/Write locks for table operations -- Key: HBASE-7305 URL: https://issues.apache.org/jira/browse/HBASE-7305 Project: HBase Issue Type: Bug Components: Client, master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.0 Attachments: 130228-zkrwlocks.pdf, 7305-v11.txt, hbase-7305_v0.patch, hbase-7305_v10.patch, hbase-7305_v13.patch, hbase-7305_v14.patch, hbase-7305_v15.patch, hbase-7305_v1-based-on-curator.patch, hbase-7305_v2.patch, hbase-7305_v4.patch, hbase-7305_v9.patch, HBaseTableLocks.pdf This has started as forward porting of HBASE-5494 and HBASE-5991 from the 89-fb branch to trunk, but diverged enough to have it's own issue. The idea is to implement a zk based read/write lock per table. Master initiated operations should get the write lock, and region operations (region split, moving, balance?, etc) acquire a shared read lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients
[ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605762#comment-13605762 ] Hadoop QA commented on HBASE-7590: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574232/7590.v13.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 32 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4873//console This message is automatically generated. Add a costless notifications mechanism from master to regionservers clients - Key: HBASE-7590 URL: https://issues.apache.org/jira/browse/HBASE-7590 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7590.inprogress.patch, 7590.v12.patch, 7590.v12.patch, 7590.v13.patch, 7590.v1.patch, 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch, 7590.v5.patch, 7590.v5.patch t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: - to lower the load on the system, without clients using staled information and going on dead machines - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. Technically, the master could send this information. To lower the load on the system, we should: - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. - receivers should not depend on this: if the information is available great. If not, it should not break anything. - it should be optional. So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that
[jira] [Commented] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605772#comment-13605772 ] Hadoop QA commented on HBASE-7597: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574237/trunk-7597.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4874//console This message is automatically generated. TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605783#comment-13605783 ] stack commented on HBASE-7905: -- Ran dumb test to compare before and after. Test is too dumb though because for the current trunk, it does not include cost building the protobuf whereas it includes the cost building the CellBlock that this patch adds. Given that, here is what I have: I added a main to TestIPC and then just did nought but pass KVs and compared current trunk to what this patch adds. Main difference between before and after is before takes a single Message param into which all data has already been serialized. The after -- i.e. cellblocks -- takes a param and a CellScanner from which it then internally composes a CellBlock to pass over the wire... so the new stuff does composition iterating all Cells to compose in memory a block to send behind the rpc (it is part of what is being measured where as with the before, the building of the object is not measured). The server puts what it receives back on the wire as a return. Running a test sending a single KV there and back 1M times has before and after taking about the same time. BEFORE: 13/03/18 14:50:38 INFO ipc.TestIPC: Cycled 100 time(s) with 1 cell(s) in 101236ms AFTER: 13/03/18 14:25:15 INFO ipc.TestIPC: Cycled 100 time(s) with 1 cell(s) in 103746ms If I do more Cells, say 100, they diverge more: BEFORE: 13/03/18 13:31:09 INFO ipc.TestIPC: Cycled 100 time(s) with 100 cell(s) in 113950ms AFTER: 13/03/18 13:40:58 INFO ipc.TestIPC: Cycled 100 time(s) with 100 cell(s) in 128230ms ~8% We should add another ~8% for server-side iteration undoing the cellblock which this test does not do. If I do 1000 cells, we go up to about 60% (double that if server is doing iterations on its side). Let me redo the test so its a bit more of a fair comparison. Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605787#comment-13605787 ] Jimmy Xiang commented on HBASE-7597: I ran the test several times locally and it is green. I checked it in for trunk and 0.95. Let's keep this open for a while to see if the problem happens again. TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-7597: --- Fix Version/s: 0.98.0 0.95.0 TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Fix For: 0.95.0, 0.98.0 Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7255) KV size metric went missing from StoreScanner.
[ https://issues.apache.org/jira/browse/HBASE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605790#comment-13605790 ] Ted Yu commented on HBASE-7255: --- If I understand the current implementation correctly, a MetricsStoreSource should be added hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver and implementations would be added: ./hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java KV size metric went missing from StoreScanner. -- Key: HBASE-7255 URL: https://issues.apache.org/jira/browse/HBASE-7255 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Elliott Clark Priority: Critical Fix For: 0.95.0 In trunk due to the metric refactor, at least the KV size metric went missing. See this code in StoreScanner.java: {code} } finally { if (cumulativeMetric 0 metric != null) { } } {code} Just an empty if statement, where the metric used to be collected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-7255) KV size metric went missing from StoreScanner.
[ https://issues.apache.org/jira/browse/HBASE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605790#comment-13605790 ] Ted Yu edited comment on HBASE-7255 at 3/18/13 11:23 PM: - If I understand the current implementation correctly, a MetricsStoreSource should be added under hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver and implementations would be added under: ./hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java was (Author: yuzhih...@gmail.com): If I understand the current implementation correctly, a MetricsStoreSource should be added hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/regionserver and implementations would be added: ./hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java ./hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/regionserver/MetricsStoreSourceImpl.java KV size metric went missing from StoreScanner. -- Key: HBASE-7255 URL: https://issues.apache.org/jira/browse/HBASE-7255 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Elliott Clark Priority: Critical Fix For: 0.95.0 In trunk due to the metric refactor, at least the KV size metric went missing. See this code in StoreScanner.java: {code} } finally { if (cumulativeMetric 0 metric != null) { } } {code} Just an empty if statement, where the metric used to be collected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8135) Mutation should implement HeapSize
[ https://issues.apache.org/jira/browse/HBASE-8135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605793#comment-13605793 ] Hadoop QA commented on HBASE-8135: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574238/8135-v3.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4875//console This message is automatically generated. Mutation should implement HeapSize -- Key: HBASE-8135 URL: https://issues.apache.org/jira/browse/HBASE-8135 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.95.0, 0.96.0, 0.94.5 Reporter: nkeywal Assignee: nkeywal Priority: Minor Fix For: 0.95.0, 0.96.0 Attachments: 8135.v1.patch, 8135.v2.patch, 8135-v3.txt Code is there already. Doing so would allow to share some code when doing client side buffering. patch compiles locally, should not impact tests, but not tested locally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7680) implement compaction policy for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7680: Attachment: HBASE-7680-v3-with-7679.patch HBASE-7680-v3.patch After discussion in HBASE-8034, redid all the reasonable places to use KV count instead of size for splitting. Also rebase. implement compaction policy for stripe compactions -- Key: HBASE-7680 URL: https://issues.apache.org/jira/browse/HBASE-7680 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7680-v0.patch, HBASE-7680-v0-with-7679-and-7935.patch, HBASE-7680-v1.patch, HBASE-7680-v1-with-7679.patch, HBASE-7680-v2.patch, HBASE-7680-v2-with-7679-and-8034.patch, HBASE-7680-v3.patch, HBASE-7680-v3-with-7679.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-8142) Sporadic TestZKProcedureControllers failures on trunk
stack created HBASE-8142: Summary: Sporadic TestZKProcedureControllers failures on trunk Key: HBASE-8142 URL: https://issues.apache.org/jira/browse/HBASE-8142 Project: HBase Issue Type: Bug Reporter: stack See https://builds.apache.org/job/PreCommit-HBASE-Build/4865//artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.procedure.TestZKProcedureControllers.txt and https://builds.apache.org/job/PreCommit-HBASE-Build/4865//artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.procedure.TestZKProcedureControllers-output.txt I see this in the output: {code} 2013-03-18 17:30:46,672 DEBUG [Thread-2-EventThread] zookeeper.ZKUtil(1682): testing utility-0x13d7e8da759 Retrieved 0 byte(s) of data from znode /hbase/testSimple/acquired/instanceTest; data=empty 2013-03-18 17:30:46,672 DEBUG [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(206): start proc data length is 0 2013-03-18 17:30:46,672 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(210): Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. 2013-03-18 17:30:46,673 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(218): Illegal argument exception java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-03-18 17:30:46,675 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(281): Failed due to null subprocedure java.lang.IllegalArgumentException via expected:java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:219) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) ... 6 more {code} The znode has zero data (Usually it has 7 bytes when test runs fine). Is the latch being triggered on node create before data is written? Pointers appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-8013) TestZKProcedureControllers fails intermittently in trunk builds
[ https://issues.apache.org/jira/browse/HBASE-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-8013. --- Resolution: Duplicate Duplicate with HBASE-8142 TestZKProcedureControllers fails intermittently in trunk builds --- Key: HBASE-8013 URL: https://issues.apache.org/jira/browse/HBASE-8013 Project: HBase Issue Type: Bug Reporter: Ted Yu See https://builds.apache.org/job/HBase-TRUNK/3918/testReport/org.apache.hadoop.hbase.procedure/TestZKProcedureControllers/testSimpleZKCohortMemberController/ This seems to be the reason: {code} 2013-03-06 10:35:31,088 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(218): Illegal argument exception java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-03-06 10:35:31,090 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(281): Failed due to null subprocedure java.lang.IllegalArgumentException via expected:java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:219) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) ... 6 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605802#comment-13605802 ] Hadoop QA commented on HBASE-7568: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574242/HBASE-7568-trunk-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4876//console This message is automatically generated. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8119) Optimize StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-8119: - Summary: Optimize StochasticLoadBalancer (was: StochasticLoadBalancer does not take into account per table balance) Optimize StochasticLoadBalancer --- Key: HBASE-8119 URL: https://issues.apache.org/jira/browse/HBASE-8119 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.95.0 Reporter: Enis Soztutar Fix For: 0.95.0 On a 5 node trunk cluster, I ran into a weird problem with StochasticLoadBalancer: server1 Thu Mar 14 03:42:50 UTC 20130.0 33 server2 Thu Mar 14 03:47:53 UTC 20130.0 34 server3 Thu Mar 14 03:46:53 UTC 2013465.0 42 server4 Thu Mar 14 03:47:53 UTC 201311455.0 282 server5 Thu Mar 14 03:47:53 UTC 20130.0 34 Total:5 11920 425 Notice that server4 has 282 regions, while the others have much less. Plus for one table with 260 regions has been super imbalanced: {code} Regions by Region Server Region Server Region Count http://server3:60030/ 10 http://server4:60030/ 250 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7905: - Attachment: 7905v17.txt Fix TestHCM at least. Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v17.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7905: - Status: Patch Available (was: Open) Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v17.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8138) Using [packed=true] for repeated field of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types)
[ https://issues.apache.org/jira/browse/HBASE-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-8138: -- Status: Patch Available (was: Open) Using [packed=true] for repeated field of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) Key: HBASE-8138 URL: https://issues.apache.org/jira/browse/HBASE-8138 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Trivial Fix For: 0.98.0 Attachments: hbase-8138.patch It's recommended to do the following for numeric primitive types {quote} For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding {quote} See details at https://developers.google.com/protocol-buffers/docs/proto -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8013) TestZKProcedureControllers fails intermittently in trunk builds
[ https://issues.apache.org/jira/browse/HBASE-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605811#comment-13605811 ] stack commented on HBASE-8013: -- Thanks Ted. I should have noticed this one. TestZKProcedureControllers fails intermittently in trunk builds --- Key: HBASE-8013 URL: https://issues.apache.org/jira/browse/HBASE-8013 Project: HBase Issue Type: Bug Reporter: Ted Yu See https://builds.apache.org/job/HBase-TRUNK/3918/testReport/org.apache.hadoop.hbase.procedure/TestZKProcedureControllers/testSimpleZKCohortMemberController/ This seems to be the reason: {code} 2013-03-06 10:35:31,088 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(218): Illegal argument exception java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-03-06 10:35:31,090 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(281): Failed due to null subprocedure java.lang.IllegalArgumentException via expected:java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:219) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) ... 6 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8119) Optimize StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605827#comment-13605827 ] Enis Soztutar commented on HBASE-8119: -- bq. Per table load balancing runs the balancer once per table. The issue turned out to be not in the per-table load balancing, which was already defaulted to false. The issue is that for 500 regions, Load balancer takes 15 min, which makes it unusable. In it's current form, StochasticLoadBalancer can only work with clusters having ~20 nodes, and low hundreds of regions. bq. There's a lot of hashmap manipulation that should be optimized out if we wanted to worry about perf. If the balancer takes more than 15 min, there is a bug in HMaster.balance() that it breaks prematurely from assigning the region plans from the balancer. One more thing is that we do not do bulk assign to the regions generated by the load balancer plan. Optimize StochasticLoadBalancer --- Key: HBASE-8119 URL: https://issues.apache.org/jira/browse/HBASE-8119 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.95.0 Reporter: Enis Soztutar Fix For: 0.95.0 On a 5 node trunk cluster, I ran into a weird problem with StochasticLoadBalancer: server1 Thu Mar 14 03:42:50 UTC 20130.0 33 server2 Thu Mar 14 03:47:53 UTC 20130.0 34 server3 Thu Mar 14 03:46:53 UTC 2013465.0 42 server4 Thu Mar 14 03:47:53 UTC 201311455.0 282 server5 Thu Mar 14 03:47:53 UTC 20130.0 34 Total:5 11920 425 Notice that server4 has 282 regions, while the others have much less. Plus for one table with 260 regions has been super imbalanced: {code} Regions by Region Server Region Server Region Count http://server3:60030/ 10 http://server4:60030/ 250 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7255) KV size metric went missing from StoreScanner.
[ https://issues.apache.org/jira/browse/HBASE-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605832#comment-13605832 ] Elliott Clark commented on HBASE-7255: -- I don't think that complexity is needed. I think we can add this metric to one of the other regionserver mbeans. KV size metric went missing from StoreScanner. -- Key: HBASE-7255 URL: https://issues.apache.org/jira/browse/HBASE-7255 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Elliott Clark Priority: Critical Fix For: 0.95.0 In trunk due to the metric refactor, at least the KV size metric went missing. See this code in StoreScanner.java: {code} } finally { if (cumulativeMetric 0 metric != null) { } } {code} Just an empty if statement, where the metric used to be collected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605838#comment-13605838 ] Varun Sharma commented on HBASE-7295: - We are not seeing any issues on the thrift gateway anymore. @lars : I can try comparing volatile and synchronized accesses to the PoolMap of type ReusablePool @stack : We do iterate over connections in HBaseClient when we try to close down the HBaseClient or stop it Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.95.0 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt, 7295-trunk-v4.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8119) Optimize StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605847#comment-13605847 ] Enis Soztutar commented on HBASE-8119: -- Quoting review at https://reviews.apache.org/r/9998/: Attaching a patch for improving the running time of StochasticLoadBalancer 200x times. TestStochasticLoadBalancer#testMidCluster() Current impl: //2013-03-15 17:28:25,495 DEBUG [main] balancer.StochasticLoadBalancer(256): Finished computing new laod balance plan. Computation took 172526ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.850001 to a new cost of 23.481578947368426 With patch: //2013-03-18 14:56:13,541 DEBUG [Thread-2] balancer.StochasticLoadBalancer(436): Finished computing new laod balance plan. Computation took 941ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.85 to a new cost of 23.48157894736842 The improvements come from: - Optimized array based data structures in Cluster class - Getting rid of hashmaps - Optimized region move and swap ops - Removing most of the computation to cluster initialization, and state change for the cluster, thus eliminating computing the same results over and over - Some profiling There should be further optimizations but this should be a good start. If we ran into more problems, we can investigate further. There are a lof of TODO's added in this patch. I'll create a jira for collecting some thoughts, but I wont have the time to work on those for now. There are (hopefully) minor semantic changes in the algo. I had to bump up loadMultiplier, and decrease moveCostMultiplier. See comments at TestStochasticLoadBalancer#testLargeCluster(). Please review carefully. As noted in testLargeCluster(), this does not work for large clusters 10 regions, 1000 nodes. This can be solved by smt like http://en.wikipedia.org/wiki/Simulated_annealing instead of random walk with eager selection. Optimize StochasticLoadBalancer --- Key: HBASE-8119 URL: https://issues.apache.org/jira/browse/HBASE-8119 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.95.0 Reporter: Enis Soztutar Fix For: 0.95.0 On a 5 node trunk cluster, I ran into a weird problem with StochasticLoadBalancer: server1 Thu Mar 14 03:42:50 UTC 20130.0 33 server2 Thu Mar 14 03:47:53 UTC 20130.0 34 server3 Thu Mar 14 03:46:53 UTC 2013465.0 42 server4 Thu Mar 14 03:47:53 UTC 201311455.0 282 server5 Thu Mar 14 03:47:53 UTC 20130.0 34 Total:5 11920 425 Notice that server4 has 282 regions, while the others have much less. Plus for one table with 260 regions has been super imbalanced: {code} Regions by Region Server Region Server Region Count http://server3:60030/ 10 http://server4:60030/ 250 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HBASE-7568: Attachment: HBASE-7568-trunk-v2.patch Missed one line that is over 100. Attaching new patch. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch, HBASE-7568-trunk-v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7680) implement compaction policy for stripe compactions
[ https://issues.apache.org/jira/browse/HBASE-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605860#comment-13605860 ] Hadoop QA commented on HBASE-7680: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574261/HBASE-7680-v3-with-7679.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 22 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestJoinedScanners Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4877//console This message is automatically generated. implement compaction policy for stripe compactions -- Key: HBASE-7680 URL: https://issues.apache.org/jira/browse/HBASE-7680 Project: HBase Issue Type: Sub-task Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7680-v0.patch, HBASE-7680-v0-with-7679-and-7935.patch, HBASE-7680-v1.patch, HBASE-7680-v1-with-7679.patch, HBASE-7680-v2.patch, HBASE-7680-v2-with-7679-and-8034.patch, HBASE-7680-v3.patch, HBASE-7680-v3-with-7679.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7905) Add passing of optional cell blocks over rpc
[ https://issues.apache.org/jira/browse/HBASE-7905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605880#comment-13605880 ] Hadoop QA commented on HBASE-7905: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574267/7905v17.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 55 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 javac{color}. The applied patch generated 6 javac compiler warnings (more than the trunk's current 4 warnings). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestJoinedScanners org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4878//console This message is automatically generated. Add passing of optional cell blocks over rpc Key: HBASE-7905 URL: https://issues.apache.org/jira/browse/HBASE-7905 Project: HBase Issue Type: Sub-task Components: IPC/RPC Reporter: stack Assignee: stack Fix For: 0.95.0 Attachments: 7900v12-depends-on-8101.txt, 7905.txt, 7905v13.txt, 7905v14.txt, 7905v15.txt, 7905v16.txt, 7905v17.txt, 7905v3.txt, 7905v4.txt, 7905v6.txt, 7905v8.txt, 7905v9.txt, testipc_for_pre_cellblocks.txt Make it so we can pass Cells/data w/o having to bury it all in protobuf to get it over the wire. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations
[ https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605886#comment-13605886 ] Enis Soztutar commented on HBASE-7305: -- Thanks Jon, I was in the process of updating the doc, but got carried away with more pressing issues. I'll attach an updated version. Your overview seems about right. bq. This primarily protects operations that clash with table level enable/disable/alter, but not region level operations, right?. If you mean assign, etc. Not it does not. bq. This doesn't guard meta from individual changes, right? It only protects meta from bulk adds (create/delete table). Thus this shouldn't affect region moves or region closes/opens. There is no guard against changes to META. Region moves, open/close does not acquire a lock. bq. If an overlapping merge and split were issued, some other mechanism is in place to keep this sane right? This doesn't protect multiple merge requests with overlapping regions right? bq. Merges will likely want the read lock? (allowing multiple concurrent merges, and assuming some overlap sanity protection from a different mechanism). Merges can be designed to acquire read lock or a write lock. If read lock, then it means there is no guarantee against trying to do a merge and a concurrent split. But this allows merges for different ranges happening at the same time. If we do write lock, it will guard against concurrent merge / split problem, but we cannot do multiple merges at the same time. The recent patch for HBASE-7403 moves the regions to be merged to the same region server. We might be able to do in-memory locking for merge and split in the RS, so that we might be able to use read locking for merges. bq. With snapshots, this mechanism doesn't prevent regions from moving so it only protects snapshots from concurrently happening with enable/disable/alter table ops. Snapshot will still fail if it gets caught while the balancer is running. Yes, there is no protection against that right now. I have to look up why region move causes snapshot to fail. bq. These locks don't really help hbck – except for the cases where enable/disable/alter operations are going on as hbck repairs things. (It wouldn't protect hbck from the balancer). hbck as it is relies too much on knowing about the filesystem layout, and META. It is hard to sync between balancer and hbck. bq. Does having a table lock (and then having individual region locks that require a table read lock being held) make sense? Maybe this makes sense for merges and splits? If we have per-region locks, we might reevaluate table locks. But I would imagine so, since it will prevent concurrent master operations as well. We can achieve the same thing with acquiring all the region locks, but table locks would be faster. bq. having individual region locks that require a table read lock being held I think we have to evaluate whether this is feasible. I guess it should be, but we should be able to scale to millions of regions. If we had per-region locks, assignment would become much easier (current RIT is similar, but we need this even for assigned regions) ZK based Read/Write locks for table operations -- Key: HBASE-7305 URL: https://issues.apache.org/jira/browse/HBASE-7305 Project: HBase Issue Type: Bug Components: Client, master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.0 Attachments: 130228-zkrwlocks.pdf, 7305-v11.txt, hbase-7305_v0.patch, hbase-7305_v10.patch, hbase-7305_v13.patch, hbase-7305_v14.patch, hbase-7305_v15.patch, hbase-7305_v1-based-on-curator.patch, hbase-7305_v2.patch, hbase-7305_v4.patch, hbase-7305_v9.patch, HBaseTableLocks.pdf This has started as forward porting of HBASE-5494 and HBASE-5991 from the 89-fb branch to trunk, but diverged enough to have it's own issue. The idea is to implement a zk based read/write lock per table. Master initiated operations should get the write lock, and region operations (region split, moving, balance?, etc) acquire a shared read lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (HBASE-8067) TestHFileArchiving.testArchiveOnTableDelete sometimes fails
[ https://issues.apache.org/jira/browse/HBASE-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-8067: --- TestHFileArchiving.testArchiveOnTableDelete sometimes fails --- Key: HBASE-8067 URL: https://issues.apache.org/jira/browse/HBASE-8067 Project: HBase Issue Type: Bug Components: Admin, master, test Affects Versions: 0.96.0, 0.94.6 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 0.95.0, 0.94.7 Attachments: HBASE-8067-debug.patch, HBASE-8067-v0.patch it seems that testArchiveOnTableDelete() fails because the archiving in DeleteTableHandler is still in progress when admin.deleteTable() returns. {code} Error Message Archived files are missing some of the store files! Stacktrace java.lang.AssertionError: Archived files are missing some of the store files! at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hbase.backup.TestHFileArchiving.testArchiveOnTableDelete(TestHFileArchiving.java:262) {code} (Looking at the problem in a more generic way, we don't have any way to inform the client when an async operation is completed) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8138) Using [packed=true] for repeated field of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types)
[ https://issues.apache.org/jira/browse/HBASE-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605909#comment-13605909 ] Hadoop QA commented on HBASE-8138: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574187/hbase-8138.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4879//console This message is automatically generated. Using [packed=true] for repeated field of primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) Key: HBASE-8138 URL: https://issues.apache.org/jira/browse/HBASE-8138 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Priority: Trivial Fix For: 0.98.0 Attachments: hbase-8138.patch It's recommended to do the following for numeric primitive types {quote} For historical reasons, repeated fields of basic numeric types aren't encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding {quote} See details at https://developers.google.com/protocol-buffers/docs/proto -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7568) [replication] Create an interface for replication queues
[ https://issues.apache.org/jira/browse/HBASE-7568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605917#comment-13605917 ] Hadoop QA commented on HBASE-7568: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12574274/HBASE-7568-trunk-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/4880//console This message is automatically generated. [replication] Create an interface for replication queues Key: HBASE-7568 URL: https://issues.apache.org/jira/browse/HBASE-7568 Project: HBase Issue Type: Sub-task Components: Replication Reporter: Chris Trezzo Assignee: Chris Trezzo Fix For: 0.95.0, 0.96.0, 0.98.0 Attachments: HBASE-7568-trunk-v1.patch, HBASE-7568-trunk-v2.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8131) Create table handler needs to handle failure cases.
[ https://issues.apache.org/jira/browse/HBASE-8131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605918#comment-13605918 ] chunhui shen commented on HBASE-8131: - {code} +for (int i = 0; i 100; i++) { + if (!TEST_UTIL.getHBaseAdmin().isTableAvailable(TABLENAME)) { +Thread.sleep(200); + } +} {code} Make a assert that table is available ? Create table handler needs to handle failure cases. --- Key: HBASE-8131 URL: https://issues.apache.org/jira/browse/HBASE-8131 Project: HBase Issue Type: Bug Components: master Affects Versions: 0.98.0 Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Attachments: HBASE-8131_trunk_1.patch, HBASE-8131_trunk_2.patch, HBASE-8131_trunk.patch In CreateTable Handler there are number of failure cases. IOExceptions are common while creation of regioninfos, htableDescriptors etc. After this exception if i try to recreate the table using admin, we need to remove the acquired table lock and also clear the ZKTable in memory cache so that the operation can be retried. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-8119) Optimize StochasticLoadBalancer
[ https://issues.apache.org/jira/browse/HBASE-8119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605847#comment-13605847 ] Enis Soztutar edited comment on HBASE-8119 at 3/19/13 1:40 AM: --- Quoting review at https://reviews.apache.org/r/9998/ Attaching a patch for improving the running time of StochasticLoadBalancer 200x times. TestStochasticLoadBalancer#testMidCluster() Current impl: //2013-03-15 17:28:25,495 DEBUG [main] balancer.StochasticLoadBalancer(256): Finished computing new laod balance plan. Computation took 172526ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.850001 to a new cost of 23.481578947368426 With patch: //2013-03-18 14:56:13,541 DEBUG [Thread-2] balancer.StochasticLoadBalancer(436): Finished computing new laod balance plan. Computation took 941ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.85 to a new cost of 23.48157894736842 The improvements come from: - Optimized array based data structures in Cluster class - Getting rid of hashmaps - Optimized region move and swap ops - Removing most of the computation to cluster initialization, and state change for the cluster, thus eliminating computing the same results over and over - Some profiling There should be further optimizations but this should be a good start. If we ran into more problems, we can investigate further. There are a lof of TODO's added in this patch. I'll create a jira for collecting some thoughts, but I wont have the time to work on those for now. There are (hopefully) minor semantic changes in the algo. I had to bump up loadMultiplier, and decrease moveCostMultiplier. See comments at TestStochasticLoadBalancer#testLargeCluster(). Please review carefully. As noted in testLargeCluster(), this does not work for large clusters 10 regions, 1000 nodes. This can be solved by smt like http://en.wikipedia.org/wiki/Simulated_annealing instead of random walk with eager selection. was (Author: enis): Quoting review at https://reviews.apache.org/r/9998/: Attaching a patch for improving the running time of StochasticLoadBalancer 200x times. TestStochasticLoadBalancer#testMidCluster() Current impl: //2013-03-15 17:28:25,495 DEBUG [main] balancer.StochasticLoadBalancer(256): Finished computing new laod balance plan. Computation took 172526ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.850001 to a new cost of 23.481578947368426 With patch: //2013-03-18 14:56:13,541 DEBUG [Thread-2] balancer.StochasticLoadBalancer(436): Finished computing new laod balance plan. Computation took 941ms to try 15000 different iterations. Found a solution that moves 600 regions; Going from a computed cost of 35.85 to a new cost of 23.48157894736842 The improvements come from: - Optimized array based data structures in Cluster class - Getting rid of hashmaps - Optimized region move and swap ops - Removing most of the computation to cluster initialization, and state change for the cluster, thus eliminating computing the same results over and over - Some profiling There should be further optimizations but this should be a good start. If we ran into more problems, we can investigate further. There are a lof of TODO's added in this patch. I'll create a jira for collecting some thoughts, but I wont have the time to work on those for now. There are (hopefully) minor semantic changes in the algo. I had to bump up loadMultiplier, and decrease moveCostMultiplier. See comments at TestStochasticLoadBalancer#testLargeCluster(). Please review carefully. As noted in testLargeCluster(), this does not work for large clusters 10 regions, 1000 nodes. This can be solved by smt like http://en.wikipedia.org/wiki/Simulated_annealing instead of random walk with eager selection. Optimize StochasticLoadBalancer --- Key: HBASE-8119 URL: https://issues.apache.org/jira/browse/HBASE-8119 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.95.0 Reporter: Enis Soztutar Fix For: 0.95.0 On a 5 node trunk cluster, I ran into a weird problem with StochasticLoadBalancer: server1 Thu Mar 14 03:42:50 UTC 20130.0 33 server2 Thu Mar 14 03:47:53 UTC 20130.0 34 server3 Thu Mar 14 03:46:53 UTC 2013465.0 42 server4 Thu Mar 14 03:47:53 UTC 201311455.0 282 server5 Thu Mar 14 03:47:53 UTC 20130.0 34 Total:5 11920 425 Notice that server4 has 282 regions, while the others have much less. Plus for one table
[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations
[ https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605930#comment-13605930 ] Sergey Shelukhin commented on HBASE-7305: - What I had in mind in the comments in HBASE-5487 was a persistent central state machine + version per region (in ZK or a table), and per table. It should allow multiple operations to proceed in parallel as long as it's logically feasible (e.g. if split is opening daughters and alter table comes you just bump the version on the node and server has to reopen, etc.). For table-wide ops like alters I am +1 on locking (it could be done via versions too though, e.g. why not allow parallel alter-s during region opening - but this is not important probably). ZK based Read/Write locks for table operations -- Key: HBASE-7305 URL: https://issues.apache.org/jira/browse/HBASE-7305 Project: HBase Issue Type: Bug Components: Client, master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.95.0 Attachments: 130228-zkrwlocks.pdf, 7305-v11.txt, hbase-7305_v0.patch, hbase-7305_v10.patch, hbase-7305_v13.patch, hbase-7305_v14.patch, hbase-7305_v15.patch, hbase-7305_v1-based-on-curator.patch, hbase-7305_v2.patch, hbase-7305_v4.patch, hbase-7305_v9.patch, HBaseTableLocks.pdf This has started as forward porting of HBASE-5494 and HBASE-5991 from the 89-fb branch to trunk, but diverged enough to have it's own issue. The idea is to implement a zk based read/write lock per table. Master initiated operations should get the write lock, and region operations (region split, moving, balance?, etc) acquire a shared read lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8097) MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing
[ https://issues.apache.org/jira/browse/HBASE-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605931#comment-13605931 ] Hudson commented on HBASE-8097: --- Integrated in hbase-0.95 #83 (See [https://builds.apache.org/job/hbase-0.95/83/]) HBASE-8097 MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing (Jeffrey Zhong) (Revision 1457934) Result = FAILURE tedyu : Files : * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/DeadServer.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java * /hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java MetaServerShutdownHandler may potentially keep bumping up DeadServer.numProcessing -- Key: HBASE-8097 URL: https://issues.apache.org/jira/browse/HBASE-8097 Project: HBase Issue Type: Bug Reporter: Jeffrey Zhong Assignee: Jeffrey Zhong Fix For: 0.95.0, 0.98.0 Attachments: 8097.txt, hbase-8097_1.patch, hbase-8097_v2.patch, hbase-8097_v3.patch {code} } catch (IOException ioe) { this.services.getExecutorService().submit(this); this.deadServers.add(serverName); throw new IOException(failed log splitting for + serverName + , will retry, ioe); } {code} this.deadServers.add(serverName); will keep incrementing DeadServer.numProcessing We can't get rid of numProcessing by just checking deadServers.size() because deadServers is also used to report some historically failed RSs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8141) Remove accidental uses of org.mortbay.log.Log
[ https://issues.apache.org/jira/browse/HBASE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605932#comment-13605932 ] Hudson commented on HBASE-8141: --- Integrated in hbase-0.95 #83 (See [https://builds.apache.org/job/hbase-0.95/83/]) HBASE-8141. Remove accidental uses of org.mortbay.log.Log (Revision 1458002) Result = FAILURE apurtell : Files : * /hbase/branches/0.95/hbase-prefix-tree/src/test/java/org/apache/hadoop/hbase/codec/prefixtree/builder/TestTreeDepth.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/ipc/TestDelayedRpc.java Remove accidental uses of org.mortbay.log.Log - Key: HBASE-8141 URL: https://issues.apache.org/jira/browse/HBASE-8141 Project: HBase Issue Type: Bug Affects Versions: 0.95.0, 0.96.0, 0.94.6 Reporter: Andrew Purtell Assignee: Andrew Purtell Priority: Trivial Fix For: 0.95.0, 0.96.0, 0.94.6 Attachments: 8141-0.94.patch, 8141-trunk.patch Remove accidental uses of org.mortbay.log.Log. Eclipse autocomplete is probably the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8108) Add m2eclispe lifecycle mapping to hbase-common
[ https://issues.apache.org/jira/browse/HBASE-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605933#comment-13605933 ] Hudson commented on HBASE-8108: --- Integrated in hbase-0.95 #83 (See [https://builds.apache.org/job/hbase-0.95/83/]) HBASE-8108: Add m2eclispe lifecycle mapping to hbase-common (Revision 1458019) Result = FAILURE jyates : Files : * /hbase/branches/0.95/hbase-common/pom.xml Add m2eclispe lifecycle mapping to hbase-common --- Key: HBASE-8108 URL: https://issues.apache.org/jira/browse/HBASE-8108 Project: HBase Issue Type: Bug Components: build Affects Versions: 0.95.0, 0.98.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 0.95.0, 0.98.0 Attachments: hbase-8108.patch, hbase-8108-v2.patch The maven-antrun-plugin execution doesn't have a default mapping in m2eclipse, so if you import the project into eclipse, you will get an error that the mapping is undefined. All that's needed is to define an execution via the org.eclipse.m2 lifecycle-mapping plugin - it doesn't actually affect the usual maven build at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7597) TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky
[ https://issues.apache.org/jira/browse/HBASE-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13605934#comment-13605934 ] Hudson commented on HBASE-7597: --- Integrated in hbase-0.95 #83 (See [https://builds.apache.org/job/hbase-0.95/83/]) HBASE-7597 TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky (Revision 1458061) Result = FAILURE jxiang : Files : * /hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java TestHBaseFsck#testRegionShouldNotBeDeployed seems to be flaky - Key: HBASE-7597 URL: https://issues.apache.org/jira/browse/HBASE-7597 Project: HBase Issue Type: Bug Reporter: Jean-Marc Spaggiari Assignee: Jimmy Xiang Fix For: 0.95.0, 0.98.0 Attachments: trunk-7597.patch I ran the entire test suite many times and always failed on, at least, testRegionShouldNotBeDeployed. Results below. I will attached more result when current tests are done. Failed tests: testDeleteExpiredStoreFiles(org.apache.hadoop.hbase.regionserver.TestStore): expected:2 but was:4 testAcquireTaskAtStartup(org.apache.hadoop.hbase.regionserver.TestSplitLogWorker): Waiting timed out after [1 000] msec testRegionShouldNotBeDeployed(org.apache.hadoop.hbase.util.TestHBaseFsck): expected:[SHOULD_NOT_BE_DEPLOYED] but was:[] testPermissionsWatcher(org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7878) recoverFileLease does not check return value of recoverLease
[ https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7878: -- Attachment: 7878-trunk-v10.txt recoverFileLease does not check return value of recoverLease Key: HBASE-7878 URL: https://issues.apache.org/jira/browse/HBASE-7878 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.95.0, 0.94.6 Reporter: Eric Newton Assignee: Ted Yu Priority: Critical Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 7878.94, 7878-94.addendum, 7878-94.addendum2, 7878-trunk.addendum, 7878-trunk.addendum2, 7878-trunk-v10.txt, 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt, 7878-trunk-v5.txt, 7878-trunk-v6.txt, 7878-trunk-v7.txt, 7878-trunk-v8.txt, 7878-trunk-v9.txt, 7878-trunk-v9.txt I think this is a problem, so I'm opening a ticket so an HBase person takes a look. Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease recovery for Accumulo after HBase's lease recovery. During testing, we experienced data loss. I found it is necessary to wait until recoverLease returns true to know that the file has been truly closed. In FSHDFSUtils, the return result of recoverLease is not checked. In the unit tests created to check lease recovery in HBASE-2645, the return result of recoverLease is always checked. I think FSHDFSUtils should be modified to check the return result, and wait until it returns true. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7878) recoverFileLease does not check return value of recoverLease
[ https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7878: -- Attachment: (was: 7878-trunk-v10.txt) recoverFileLease does not check return value of recoverLease Key: HBASE-7878 URL: https://issues.apache.org/jira/browse/HBASE-7878 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.95.0, 0.94.6 Reporter: Eric Newton Assignee: Ted Yu Priority: Critical Fix For: 0.95.0, 0.98.0, 0.94.7 Attachments: 7878.94, 7878-94.addendum, 7878-94.addendum2, 7878-trunk.addendum, 7878-trunk.addendum2, 7878-trunk-v10.txt, 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt, 7878-trunk-v5.txt, 7878-trunk-v6.txt, 7878-trunk-v7.txt, 7878-trunk-v8.txt, 7878-trunk-v9.txt, 7878-trunk-v9.txt I think this is a problem, so I'm opening a ticket so an HBase person takes a look. Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease recovery for Accumulo after HBase's lease recovery. During testing, we experienced data loss. I found it is necessary to wait until recoverLease returns true to know that the file has been truly closed. In FSHDFSUtils, the return result of recoverLease is not checked. In the unit tests created to check lease recovery in HBASE-2645, the return result of recoverLease is always checked. I think FSHDFSUtils should be modified to check the return result, and wait until it returns true. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-8142) Sporadic TestZKProcedureControllers failures on trunk
[ https://issues.apache.org/jira/browse/HBASE-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Zhong updated HBASE-8142: - Attachment: hase-8142_v1.patch The test case fail is due to the function ZKUtil.createSetData is NOT an atomic operation in file TestZKProcedureControllers.java. Therefore, sometime you'll see a znode is created without data {code} ZKUtil.createSetData(watcher, prepare, ProtobufUtil.prependPBMagic(data)); {code} I changed ZKUtil.createSetData to make atomic createset to fix the test case. While Jonathan may need to double check the code to see if we need handle the case in the code. [~jmhsieh] Do you need to patch startNewSubprocedure in order to handle the possible non-atomic scenario? Thanks, -Jeffrey Sporadic TestZKProcedureControllers failures on trunk - Key: HBASE-8142 URL: https://issues.apache.org/jira/browse/HBASE-8142 Project: HBase Issue Type: Bug Reporter: stack Attachments: hase-8142_v1.patch See https://builds.apache.org/job/PreCommit-HBASE-Build/4865//artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.procedure.TestZKProcedureControllers.txt and https://builds.apache.org/job/PreCommit-HBASE-Build/4865//artifact/trunk/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.procedure.TestZKProcedureControllers-output.txt I see this in the output: {code} 2013-03-18 17:30:46,672 DEBUG [Thread-2-EventThread] zookeeper.ZKUtil(1682): testing utility-0x13d7e8da759 Retrieved 0 byte(s) of data from znode /hbase/testSimple/acquired/instanceTest; data=empty 2013-03-18 17:30:46,672 DEBUG [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(206): start proc data length is 0 2013-03-18 17:30:46,672 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(210): Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. 2013-03-18 17:30:46,673 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(218): Illegal argument exception java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 2013-03-18 17:30:46,675 ERROR [Thread-2-EventThread] procedure.ZKProcedureMemberRpcs(281): Failed due to null subprocedure java.lang.IllegalArgumentException via expected:java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:219) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56) at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: java.lang.IllegalArgumentException: Data in for starting procuedure instanceTest is illegally formatted. Killing the procedure. at org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211) ... 6 more {code} The znode has zero data (Usually it has 7 bytes when test runs fine). Is the latch being triggered on node create before data is written? Pointers appreciated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira