[jira] [Commented] (HBASE-14227) Fold special cased MOB APIs into existing APIs
[ https://issues.apache.org/jira/browse/HBASE-14227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718126#comment-14718126 ] Hadoop QA commented on HBASE-14227: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752927/HBASE-14227_v5.patch against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d. ATTACHMENT ID: 12752927 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1851 checkstyle errors (more than the master's current 1849 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestIOFencing {color:red}-1 core zombie tests{color}. There are 7 zombie test(s): at org.apache.hadoop.hbase.security.access.TestAccessController.testAccessControllerUserPermsRegexHandling(TestAccessController.java:2560) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15313//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15313//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15313//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15313//console This message is automatically generated. Fold special cased MOB APIs into existing APIs -- Key: HBASE-14227 URL: https://issues.apache.org/jira/browse/HBASE-14227 Project: HBase Issue Type: Task Components: mob Affects Versions: 2.0.0 Reporter: Andrew Purtell Assignee: Heng Chen Priority: Blocker Fix For: 2.0.0 Attachments: HBASE-14227.patch, HBASE-14227_v1.patch, HBASE-14227_v2.patch, HBASE-14227_v3.patch, HBASE-14227_v4.patch, HBASE-14227_v5.patch There are a number of APIs that came in with MOB that are not new actions for HBase, simply new actions for a MOB implementation: - compactMob - compactMobs - majorCompactMob - majorCompactMobs - getMobCompactionState And in HBaseAdmin: - validateMobColumnFamily Remove these special cases from the Admin API where possible by folding them into existing APIs. We definitely don't need one method for a singleton and another for collections. Ideally we will not have any APIs named *Mob when finished, whether MOBs are in use on a table or not should be largely an internal detail. Exposing as schema option would be fine, this conforms to existing practice for other features. Marking critical because I think removing the *Mob special cased APIs should be a precondition for release of this feature either in 2.0 or as a backport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
[ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718138#comment-14718138 ] Heng Chen commented on HBASE-14327: --- I met this problem too. https://builds.apache.org/job/PreCommit-HBASE-Build/15313//testReport/ TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky -- Key: HBASE-14327 URL: https://issues.apache.org/jira/browse/HBASE-14327 Project: HBase Issue Type: Bug Components: test Reporter: Dima Spivak Priority: Critical I'm looking into some more of the flaky tests on trunk and this one seems to be particularly gross, failing about half the time in recent days. Some probably-relevant output from [a recent run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]: {noformat} 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing compaction to proceed 2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions 2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc 2015-08-27 18:50:14,342 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): Completed compaction of 2 (all) file(s) in family of tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9 K. This selection was in queue for 0sec, and took 10sec to execute. 2015-08-27 18:50:14,343 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(527): Completed compaction: Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., storeName=family, fileCount=2, fileSize=73.1 K, priority=998, time=525052314434020; duration=10sec 2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/2926c09f1941416eb557ee5d283d7e2b 2015-08-27 18:50:14,347
[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.
[ https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718186#comment-14718186 ] Hadoop QA commented on HBASE-14261: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752938/HBASE-14261.branch-1_v2.patch against branch-1 branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d. ATTACHMENT ID: 12752938 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 30 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3816 checkstyle errors (more than the master's current 3815 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +clusterManager.kill(ServiceType.HBASE_REGIONSERVER, serverName.getHostname(), serverName.getPort()); +clusterManager.stop(ServiceType.HBASE_REGIONSERVER, serverName.getHostname(), serverName.getPort()); +clusterManager.kill(ServiceType.HBASE_ZOOKEEPER, serverName.getHostname(), serverName.getPort()); +clusterManager.stop(ServiceType.HBASE_ZOOKEEPER, serverName.getHostname(), serverName.getPort()); +clusterManager.start(ServiceType.HADOOP_DATANODE, serverName.getHostname(), serverName.getPort()); +clusterManager.kill(ServiceType.HADOOP_DATANODE, serverName.getHostname(), serverName.getPort()); +clusterManager.stop(ServiceType.HADOOP_DATANODE, serverName.getHostname(), serverName.getPort()); + * 1 SSH options, 2 user name , 3 @ if username is set, 4 host, 5 original command, 6 service user. + private static final String DEFAULT_TUNNEL_CMD = /usr/bin/ssh %1$s %2$s%3$s%4$s \sudo -u %6$s %5$s\; + String cmd = String.format(tunnelCmd, sshOptions, sshUserName, at, hostname, remoteCmd, getServiceUser()); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.cloudstack.GetServiceProviderMetaDataCmdTest.testAuthenticate(GetServiceProviderMetaDataCmdTest.java:83) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15314//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15314//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15314//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15314//console This message is automatically generated. Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections. - Key: HBASE-14261 URL: https://issues.apache.org/jira/browse/HBASE-14261 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Attachments: HBASE-14261-branch-1.patch, HBASE-14261.branch-1_v2.patch One of the shortcomings of existing ChaosMonkey framework is lack of fault injections for hbase dependencies like zookeeper, hdfs etc. This patch attempts to solve this problem partially by adding datanode and zk node fault injections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13153) enable bulkload to support replication
[ https://issues.apache.org/jira/browse/HBASE-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-13153: -- Fix Version/s: 2.0.0 Component/s: (was: API) Replication Issue Type: New Feature (was: Bug) enable bulkload to support replication -- Key: HBASE-13153 URL: https://issues.apache.org/jira/browse/HBASE-13153 Project: HBase Issue Type: New Feature Components: Replication Reporter: sunhaitao Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: HBase Bulk Load Replication.pdf Currently we plan to use HBase Replication feature to deal with disaster tolerance scenario.But we encounter an issue that we will use bulkload very frequently,because bulkload bypass write path, and will not generate WAL, so the data will not be replicated to backup cluster. It's inappropriate to bukload twice both on active cluster and backup cluster. So i advise do some modification to bulkload feature to enable bukload to both active cluster and backup cluster -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
[ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718387#comment-14718387 ] Heng Chen commented on HBASE-14327: --- As testcase comments. {code} Test that puts up a regionserver, starts a compaction on a loaded region but holds the * compaction completion until after we have killed the server and the region has come up on * a new regionserver altogether. {code} It seems we should check the store files on new region before compaction finished. But as code below {code} LOG.info(Allowing compaction to proceed); compactingRegion.allowCompactions(); while (compactingRegion.compactCount == 0) { Thread.sleep(1000); } // The server we killed stays up until the compaction that was started before it was killed completes. In logs // you should see the old regionserver now going down. LOG.info(Compaction finished); // After compaction of old region finishes on the server that was going down, make sure that // all the files we expect are still working when region is up in new location. FileSystem fs = newRegion.getFilesystem(); for (String f: newRegion.getStoreFileList(new byte [][] {FAMILY})) { assertTrue(After compaction, does not exist: + f, fs.exists(new Path(f))); } {code} It seems we check store files after compaction. Is it right? TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky -- Key: HBASE-14327 URL: https://issues.apache.org/jira/browse/HBASE-14327 Project: HBase Issue Type: Bug Components: test Reporter: Dima Spivak Priority: Critical I'm looking into some more of the flaky tests on trunk and this one seems to be particularly gross, failing about half the time in recent days. Some probably-relevant output from [a recent run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]: {noformat} 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing compaction to proceed 2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions 2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc 2015-08-27 18:50:14,342 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): Completed compaction of 2 (all) file(s) in family of
[jira] [Updated] (HBASE-14331) a single callQueue related improvements
[ https://issues.apache.org/jira/browse/HBASE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HBASE-14331: -- Attachment: CallQueuePerformanceTestApp.java Added a simple application to check performance between {{LinkedBlockingQueue}} and {{ConcurrentLinkedQueue}} with {{Semaphore}}. The callQueue here is not one in HBase but is defined inside as a mere simple interface. This application also show how many threads to read from the queue were simultaneously waken. In my environment using {{ConcurrentLinkedQueue}} is about 1.5-2 times faster, aside from whether the performance test practically makes sense. a single callQueue related improvements --- Key: HBASE-14331 URL: https://issues.apache.org/jira/browse/HBASE-14331 Project: HBase Issue Type: Improvement Components: IPC/RPC, Performance Reporter: Hiroshi Ikeda Priority: Minor Attachments: CallQueuePerformanceTestApp.java {{LinkedBlockingQueue}} well separates locks between the {{take}} method and the {{put}} method, but not between takers, and not between putters. These methods are implemented to take locks at the almost beginning of their logic. HBASE-11355 introduces multiple call-queues to reduce such possible congestion, but I doubt that it is required to stick to {{BlockingQueue}}. There are the other shortcomings of using {{BlockingQueue}}. When using multiple queues, since {{BlockingQueue}} blocks threads it is required to prepare enough threads for each queue. It is possible that there is a queue starving for threads while there is another queue where threads are idle. Even if you can tune parameters to avoid such situations, the tuning is not so trivial. I suggest using a single {{ConcurrentLinkedQueue}} with {{Semaphore}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14279) Race condition in ConcurrentIndex
[ https://issues.apache.org/jira/browse/HBASE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718316#comment-14718316 ] Hiroshi Ikeda commented on HBASE-14279: --- There is a similar class, {{KeyLoccker}}, whose key type is a generic one, and you can use it without copying its implementation logic. The implementation of {{IdLocker}} and {{KeyLocker}} is not good and should be re-written, and I believe it is better to avoid to copy their implementation. To begin with, using 2 objects of {{ConcurrentHashMap}} is not good because {{putIfAbset}} and {{remove}} use just lock striping internally, and the periods of time which may cause conflict between threads increases twofold. It is better to manually use lock striping instead of {{ConcurrentHashMap}}. BTW, {{ConcurrentHashMap.get}} just locks the segment only when accessing a value, and it is an idem to use {{get}} before {{putIfAbsent}}. This has potentially to reduce granularity of locks, though it seems not trivial to implement. There is another thing to concern about. From the point of the view of object-oriented programing, exposing internal values by the method {{values}} causes breaking the invariant that the empty set should be removed from the map. It should be defensively copied, though it is required to change the API, and it may also be required to change code of the clients which use this class, making the patch easier to become stale. Race condition in ConcurrentIndex - Key: HBASE-14279 URL: https://issues.apache.org/jira/browse/HBASE-14279 Project: HBase Issue Type: Bug Reporter: Hiroshi Ikeda Assignee: Heng Chen Priority: Minor Attachments: HBASE-14279.patch, HBASE-14279_v2.patch {{ConcurrentIndex.put}} and {{remove}} are in race condition. It is possible to remove a non-empty set, and to add a value to a removed set. Also {{ConcurrentIndex.values}} is vague in sense that the returned set sometimes trace the current state and sometimes doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14331) a single callQueue related improvements
Hiroshi Ikeda created HBASE-14331: - Summary: a single callQueue related improvements Key: HBASE-14331 URL: https://issues.apache.org/jira/browse/HBASE-14331 Project: HBase Issue Type: Improvement Components: IPC/RPC, Performance Reporter: Hiroshi Ikeda Priority: Minor {{LinkedBlockingQueue}} well separates locks between the {{take}} method and the {{put}} method, but not between takers, and not between putters. These methods are implemented to take locks at the almost beginning of their logic. HBASE-11355 introduces multiple call-queues to reduce such possible congestion, but I doubt that it is required to stick to {{BlockingQueue}}. There are the other shortcomings of using {{BlockingQueue}}. When using multiple queues, since {{BlockingQueue}} blocks threads it is required to prepare enough threads for each queue. It is possible that there is a queue starving for threads while there is another queue where threads are idle. Even if you can tune parameters to avoid such situations, the tuning is not so trivial. I suggest using a single {{ConcurrentLinkedQueue}} with {{Semaphore}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14279) Race condition in ConcurrentIndex
[ https://issues.apache.org/jira/browse/HBASE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-14279: -- Status: Patch Available (was: Open) Race condition in ConcurrentIndex - Key: HBASE-14279 URL: https://issues.apache.org/jira/browse/HBASE-14279 Project: HBase Issue Type: Bug Reporter: Hiroshi Ikeda Assignee: Heng Chen Priority: Minor Attachments: HBASE-14279.patch, HBASE-14279_v2.patch {{ConcurrentIndex.put}} and {{remove}} are in race condition. It is possible to remove a non-empty set, and to add a value to a removed set. Also {{ConcurrentIndex.values}} is vague in sense that the returned set sometimes trace the current state and sometimes doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
[ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718366#comment-14718366 ] Heng Chen commented on HBASE-14327: --- After analysis the log, we can see the storefile which not found was archived into other place. {code} 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc {code} So the reader to this storefile closed, and NPE throws when we use this reader to generate log message {code} 1340 for (StoreFile sf: sfs) { 1341message.append(sf.getPath().getName()); 1342message.append((size=); 1343 message.append(TraditionalBinaryPrefix.long2String(sf.getReader().length(), , 1)); 1344message.append(), ); 1345 } {code} So, to fix this issue. We can stop archive store files or delay it. TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky -- Key: HBASE-14327 URL: https://issues.apache.org/jira/browse/HBASE-14327 Project: HBase Issue Type: Bug Components: test Reporter: Dima Spivak Priority: Critical I'm looking into some more of the flaky tests on trunk and this one seems to be particularly gross, failing about half the time in recent days. Some probably-relevant output from [a recent run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]: {noformat} 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing compaction to proceed 2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions 2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc 2015-08-27 18:50:14,342 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): Completed compaction of 2 (all) file(s) in family of tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total
[jira] [Commented] (HBASE-14279) Race condition in ConcurrentIndex
[ https://issues.apache.org/jira/browse/HBASE-14279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718440#comment-14718440 ] Hadoop QA commented on HBASE-14279: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752702/HBASE-14279_v2.patch against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d. ATTACHMENT ID: 12752702 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1850 checkstyle errors (more than the master's current 1849 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 10 zombie test(s): at org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatTestBase.testScan(MultiTableInputFormatTestBase.java:255) at org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatTestBase.testScanEmptyToAPP(MultiTableInputFormatTestBase.java:202) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleHFileSplit(TestLoadIncrementalHFiles.java:153) at org.apache.hadoop.hbase.mapreduce.TestRowCounter.testRowCounterExclusiveColumn(TestRowCounter.java:111) at org.apache.hadoop.hbase.mapreduce.TestImportTSVWithVisibilityLabels.testMROnTableWithDeletes(TestImportTSVWithVisibilityLabels.java:182) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingHFileSplit(TestLoadIncrementalHFiles.java:193) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testRegionCrossingHFileSplit(TestLoadIncrementalHFiles.java:171) at org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles.testSimpleHFileSplit(TestLoadIncrementalHFiles.java:153) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15315//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15315//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15315//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15315//console This message is automatically generated. Race condition in ConcurrentIndex - Key: HBASE-14279 URL: https://issues.apache.org/jira/browse/HBASE-14279 Project: HBase Issue Type: Bug Reporter: Hiroshi Ikeda Assignee: Heng Chen Priority: Minor Attachments: HBASE-14279.patch, HBASE-14279_v2.patch {{ConcurrentIndex.put}} and {{remove}} are in race condition. It is possible to remove a non-empty set, and to add a value to a removed set. Also {{ConcurrentIndex.values}} is vague in sense that the returned set sometimes trace the current state and sometimes doesn't. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14332) give the table state when we encountered exception while disable/enable table
Nick.han created HBASE-14332: Summary: give the table state when we encountered exception while disable/enable table Key: HBASE-14332 URL: https://issues.apache.org/jira/browse/HBASE-14332 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Nick.han Assignee: Nick.han Priority: Minor Fix For: 2.0.0 This patch is a advice for good user experience reason: When we disable a table and the table is not enabled,we receive a exception,but the exception is too brief,some time we want to know what state is the table in,so that we can know why the table can't be disable.For example,I once encountered a problem the table is neither disable nor enable when my region server crash down,if we give the table state,I will find the problem more quickly . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14325) Add snapshotinfo command to hbase script
[ https://issues.apache.org/jira/browse/HBASE-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick.han updated HBASE-14325: - Attachment: (was: HBASE-14332.patch) Add snapshotinfo command to hbase script Key: HBASE-14325 URL: https://issues.apache.org/jira/browse/HBASE-14325 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 2.0.0, 1.2.0 Reporter: Samir Ahmic Assignee: Samir Ahmic Priority: Minor Attachments: HBASE-14325.patch Since we already have commands like hbck, hfile, wal etc. that are used for getting various types of information about HBase components it make sense to me to add SnapshotInfo tool to collection. If nobody objects i would add patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14325) Add snapshotinfo command to hbase script
[ https://issues.apache.org/jira/browse/HBASE-14325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick.han updated HBASE-14325: - Attachment: HBASE-14332.patch Add snapshotinfo command to hbase script Key: HBASE-14325 URL: https://issues.apache.org/jira/browse/HBASE-14325 Project: HBase Issue Type: Improvement Components: scripts Affects Versions: 2.0.0, 1.2.0 Reporter: Samir Ahmic Assignee: Samir Ahmic Priority: Minor Attachments: HBASE-14325.patch, HBASE-14332.patch Since we already have commands like hbck, hfile, wal etc. that are used for getting various types of information about HBase components it make sense to me to add SnapshotInfo tool to collection. If nobody objects i would add patch for this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14332) give the table state when we encountered exception while disable/enable table
[ https://issues.apache.org/jira/browse/HBASE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick.han updated HBASE-14332: - Attachment: HBASE-14332.patch give the table state when we encountered exception while disable/enable table - Key: HBASE-14332 URL: https://issues.apache.org/jira/browse/HBASE-14332 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Nick.han Assignee: Nick.han Priority: Minor Fix For: 2.0.0 Attachments: HBASE-14332.patch This patch is a advice for good user experience reason: When we disable a table and the table is not enabled,we receive a exception,but the exception is too brief,some time we want to know what state is the table in,so that we can know why the table can't be disable.For example,I once encountered a problem the table is neither disable nor enable when my region server crash down,if we give the table state,I will find the problem more quickly . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14289) Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14289: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for the review, Andrew. Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 --- Key: HBASE-14289 URL: https://issues.apache.org/jira/browse/HBASE-14289 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.15 Attachments: 14289-0.98-v2.txt, 14289-0.98-v3.txt, 14289-0.98-v4.txt, 14289-0.98-v5.txt The default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. This issue backports HBASE-13965 to 0.98 branch to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14332) give the table state when we encountered exception while disable/enable table
[ https://issues.apache.org/jira/browse/HBASE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14332: --- Status: Patch Available (was: Open) give the table state when we encountered exception while disable/enable table - Key: HBASE-14332 URL: https://issues.apache.org/jira/browse/HBASE-14332 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Nick.han Assignee: Nick.han Priority: Minor Fix For: 2.0.0 Attachments: HBASE-14332.patch This patch is a advice for good user experience reason: When we disable a table and the table is not enabled,we receive a exception,but the exception is too brief,some time we want to know what state is the table in,so that we can know why the table can't be disable.For example,I once encountered a problem the table is neither disable nor enable when my region server crash down,if we give the table state,I will find the problem more quickly . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14332) Show the table state when we encounter exception while disabling / enabling table
[ https://issues.apache.org/jira/browse/HBASE-14332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14332: --- Summary: Show the table state when we encounter exception while disabling / enabling table (was: give the table state when we encountered exception while disable/enable table) Show the table state when we encounter exception while disabling / enabling table - Key: HBASE-14332 URL: https://issues.apache.org/jira/browse/HBASE-14332 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Nick.han Assignee: Nick.han Priority: Minor Fix For: 2.0.0 Attachments: HBASE-14332.patch This patch is a advice for good user experience reason: When we disable a table and the table is not enabled,we receive a exception,but the exception is too brief,some time we want to know what state is the table in,so that we can know why the table can't be disable.For example,I once encountered a problem the table is neither disable nor enable when my region server crash down,if we give the table state,I will find the problem more quickly . -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12187) Review in source the paper Simple Testing Can Prevent Most Critical Failures
[ https://issues.apache.org/jira/browse/HBASE-12187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718631#comment-14718631 ] Sean Busbey commented on HBASE-12187: - [~d.yuan], did your changes to error-prone ever get accepted into the main error-prone repository? I'm having some difficulty finding them. Review in source the paper Simple Testing Can Prevent Most Critical Failures -- Key: HBASE-12187 URL: https://issues.apache.org/jira/browse/HBASE-12187 Project: HBase Issue Type: Bug Reporter: stack Priority: Critical Attachments: HBASE-12187.patch, abortInOvercatch.warnings.txt, emptyCatch.warnings.txt, todoInCatch.warnings.txt Review the helpful paper https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf It describes 'catastrophic failures', especially issues where exceptions are thrown but not properly handled. Their static analysis tool Aspirator turns up a bunch of the obvious offenders (Lets add to test-patch.sh alongside findbugs?). This issue is about going through code base making sub-issues to root out these and others (Don't we have the test described in figure #6 already? I thought we did? If we don't, need to add). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14331) a single callQueue related improvements
[ https://issues.apache.org/jira/browse/HBASE-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720048#comment-14720048 ] Ted Yu commented on HBASE-14331: In HBase, there is BoundedConcurrentLinkedQueue Can you expand your program by including BoundedConcurrentLinkedQueue ? Thanks a single callQueue related improvements --- Key: HBASE-14331 URL: https://issues.apache.org/jira/browse/HBASE-14331 Project: HBase Issue Type: Improvement Components: IPC/RPC, Performance Reporter: Hiroshi Ikeda Priority: Minor Attachments: CallQueuePerformanceTestApp.java {{LinkedBlockingQueue}} well separates locks between the {{take}} method and the {{put}} method, but not between takers, and not between putters. These methods are implemented to take locks at the almost beginning of their logic. HBASE-11355 introduces multiple call-queues to reduce such possible congestion, but I doubt that it is required to stick to {{BlockingQueue}}. There are the other shortcomings of using {{BlockingQueue}}. When using multiple queues, since {{BlockingQueue}} blocks threads it is required to prepare enough threads for each queue. It is possible that there is a queue starving for threads while there is another queue where threads are idle. Even if you can tune parameters to avoid such situations, the tuning is not so trivial. I suggest using a single {{ConcurrentLinkedQueue}} with {{Semaphore}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13469) [branch-1.1] Procedure V2 - Make procedure v2 configurable in branch-1.1
[ https://issues.apache.org/jira/browse/HBASE-13469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13469: --- Parent Issue: HBASE-14336 (was: HBASE-12439) [branch-1.1] Procedure V2 - Make procedure v2 configurable in branch-1.1 Key: HBASE-13469 URL: https://issues.apache.org/jira/browse/HBASE-13469 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 1.1.0 Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Fix For: 1.1.0 Attachments: HBASE-13469.v2-branch-1.1.patch In branch-1, I think we want proc v2 to be configurable, so that if any non-recoverable issue is found, at least there is a workaround. We already have the handlers and code laying around. It will be just introducing the config to enable / disable. We can even make it dynamically configurable via the new framework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13415: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: HBASE-13415.v1-master.patch, HBASE-13415.v2-master.patch, HBASE-13415.v3-master.patch The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13211) Procedure V2 - master Enable/Disable table
[ https://issues.apache.org/jira/browse/HBASE-13211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13211: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - master Enable/Disable table -- Key: HBASE-13211 URL: https://issues.apache.org/jira/browse/HBASE-13211 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Labels: reliability Fix For: 2.0.0, 1.1.0 Attachments: EnableDisableTableServer-no-gen-files.v1-master.patch, HBASE-13211-v2-branch-1.patch, HBASE-13211-v2.patch Original Estimate: 120h Time Spent: 216h Remaining Estimate: 0h master side, part of HBASE-12439 starts up the procedure executor on the master and replaces the enable/disable table handlers with the procedure version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13212) Procedure V2 - master Create/Modify/Delete namespace
[ https://issues.apache.org/jira/browse/HBASE-13212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13212: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - master Create/Modify/Delete namespace Key: HBASE-13212 URL: https://issues.apache.org/jira/browse/HBASE-13212 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Labels: reliability Fix For: 2.0.0, 1.3.0 Attachments: HBASE-13212.v1-branch-1.patch, HBASE-13212.v1-master.patch, HBASE-13212.v2-master.patch, HBASE-13212.v3-master.patch Original Estimate: 168h Remaining Estimate: 168h master side, part of HBASE-12439 starts up the procedure executor on the master and replaces the create/modify/delete namespace handlers with the procedure version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13290) Procedure v2 - client enable/disable table sync
[ https://issues.apache.org/jira/browse/HBASE-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13290: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - client enable/disable table sync --- Key: HBASE-13290 URL: https://issues.apache.org/jira/browse/HBASE-13290 Project: HBase Issue Type: Sub-task Components: Client Affects Versions: 2.0.0, 1.1.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Fix For: 2.0.0, 1.1.0 Attachments: EnableDisableTableClientPV2-draft.v0-master.patch, HBASE-13290-v2-branch-1.patch, HBASE-13290-v2.patch Original Estimate: 120h Time Spent: 72h Remaining Estimate: 0h client side, part of HBASE-12439/HBASE-13211 it uses the new procedure code to be know when the procedure is completed, and have a proper sync/async behavior on enable/disable table. For 1.1, It has to be binary compatible (1.0 client can talk to 1.1 server 1.1 client can talk to 1.0 server). Binary compatible is TBD for 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14289) Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720901#comment-14720901 ] Hudson commented on HBASE-14289: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1058 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1058/]) HBASE-14289 Revert due to compilation problem in downstream project(s) (tedyu: rev c78260cd1edc85a5cc1d7e7caaedf499b12e67b9) * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-hadoop1-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 --- Key: HBASE-14289 URL: https://issues.apache.org/jira/browse/HBASE-14289 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.15 Attachments: 14289-0.98-v2.txt, 14289-0.98-v3.txt, 14289-0.98-v4.txt, 14289-0.98-v5.txt The default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. This issue backports HBASE-13965 to 0.98 branch to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720902#comment-14720902 ] Hudson commented on HBASE-14315: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1058 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1058/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev 7a4fa7f20cf8084ce57dcaa83e0c4f430d290736) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720933#comment-14720933 ] Elliott Clark commented on HBASE-6721: -- bq.Maybe you can elaborate on this point? Sure let me write up something more on that point. bq.Let's not let the perfect be the enemy of the good. I'm not asking for perfect. I'm asking not to take a half step back for most users so that one user can merge this feature and take a single step forward. bq.Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit. Yeah and I have evaluated it and used my knowledge and judgement and cast my vote on this patch and feature in accordance with the Apache Foundation rules (http://www.apache.org/foundation/voting.html#votes-on-code-modification) . I have added my technical reasons. I have even outlined what I would need to vote a different way. bq. the 0.90 master rewrite (remember that?) Yeah that has since proved to be the wrong way to go. It put way too much in ZK and now we've spent years un-doing it. We would have been better served with asking for parts to be pluggable and not on by default. bq.or the memcache-based block cache carried That followed the exact same bar that I'm requesting here. No added complexity on the default use case. I've even moved it into a different module so that it will be exactly what I'm asking for here. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks
[ https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720960#comment-14720960 ] Hadoop QA commented on HBASE-14283: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753103/HBASE-14283-v2.patch against master branch at commit 0d06d8ddd0aa9aff7476fb6a7acd6af1d24ba3fc. ATTACHMENT ID: 12753103 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1849 checkstyle errors (more than the master's current 1846 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15322//console This message is automatically generated. Reverse scan doesn’t work with HFile inline index/bloom blocks -- Key: HBASE-14283 URL: https://issues.apache.org/jira/browse/HBASE-14283 Project: HBase Issue Type: Bug Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, hfile-seek-before.patch Reverse scans do not work if an HFile contains inline bloom blocks or leaf level index blocks. The reason is because the seekBefore() call calculates the previous data block’s size by assuming data blocks are contiguous which is not the case in HFile V2 and beyond. Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes: (1) a unit test which exposes the bug and demonstrates failures for both inline bloom blocks and inline index blocks (2) a proposed fix for inline index blocks that does not require a new HFile version change, but is only performant for 1 and 2-level indexes and not 3+. 3+ requires an HFile format update for optimal performance. This patch does not fix the bloom filter blocks bug. But the fix should be similar to the case of inline index blocks. The reason I haven’t made the change yet is I want to confirm that you guys would be fine with me revising the HFile.Reader interface. Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDeleteBloomFilterMetadata) need to return the BloomFilter. Right now the HFileReader class doesn’t have a reference to the bloom filters (and hence their indices) and only constructs the IO streams and hence has no way to know where the bloom blocks are in the HFile. It seems that the HFile.Reader bloom method comments state that they “know nothing about how that metadata is structured” but I do not know if that is a requirement of the abstraction (why?) or just an incidental current property. We would like to do 3 things with community approval: (1) Update the HFile.Reader interface and implementation to contain and return BloomFilters directly rather than unstructured IO streams (2) Merge the fixes for index blocks and bloom blocks into open source (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ field in the block header in the next HFile version, so that seekBefore() calls can not only be correct but performant in all cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14336) Procedure V2 Phase 1 - Procedure Framework and Making DDL Operations fault tolerant
Stephen Yuan Jiang created HBASE-14336: -- Summary: Procedure V2 Phase 1 - Procedure Framework and Making DDL Operations fault tolerant Key: HBASE-14336 URL: https://issues.apache.org/jira/browse/HBASE-14336 Project: HBase Issue Type: Task Components: proc-v2 Affects Versions: 1.1.0, 2.0.0, 1.2.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang This is the first phase of Procedure V2 (HBASE-12439) - Core framework - re-implement Namespace/Table/Column DDLs to multi-steps procedure with a rollback/rollforward ability in case of failure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14108) Administrative Task: provide an API to abort a procedure
[ https://issues.apache.org/jira/browse/HBASE-14108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14108: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Administrative Task: provide an API to abort a procedure Key: HBASE-14108 URL: https://issues.apache.org/jira/browse/HBASE-14108 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Attachments: HBASE-14108.v1-master.patch With Procedure-V2 in production since HBASE 1.1 release, there is a need to abort a procedure (eg. for a long-running procedure that stucks somewhere and blocks others). The command could either from shell or Web UI. This task tracks the work to provide an API to abort a procedure (either rollback or simply quit). This API could be used either from shell or Web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14212) Add IT test for procedure-v2-based namespace DDL
[ https://issues.apache.org/jira/browse/HBASE-14212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14212: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Add IT test for procedure-v2-based namespace DDL Key: HBASE-14212 URL: https://issues.apache.org/jira/browse/HBASE-14212 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Integration test for proc-v2-based table DDLs was created in HBASE-12439 during HBASE 1.1 release. With HBASE-13212, proc-v2-based namespace DDLs are introduced. We need to enhanced the IT from HBASE-12429 to include namespace DDLs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13993) WALProcedureStore fencing is not effective if new WAL rolls
[ https://issues.apache.org/jira/browse/HBASE-13993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13993: --- Parent Issue: HBASE-14336 (was: HBASE-12439) WALProcedureStore fencing is not effective if new WAL rolls Key: HBASE-13993 URL: https://issues.apache.org/jira/browse/HBASE-13993 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-13993-v2.patch, HBASE-13993-v3.patch, hbase-13993_v1.patch WAL fencing for the WALProcedureStore is a bit different than the fencing done for region server WALs. In case of this sequence of events, the WAL is not fenced (especially with HBASE-13832 patch): - master1 creates WAL with logId = 1: {{/MasterProcWALs/state-0001.log}} - master2 takes over, fences logId = 1 with recoverLease(), creates logId=2: {{/MasterProcWALs/state-0002.log}}. - master2 writes some procedures and rolls the logId2, and creates logId = 3, and deletes logId = 2. - master1 now tries to write a procedure, gets lease mismatch, rolls the log from 1 to 2, and succeeds the write since it can write logId = 2 (master2 uses logId=3 now). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13832: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch, HBASE-13832-v6.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13950) Add a NoopProcedureStore for testing
[ https://issues.apache.org/jira/browse/HBASE-13950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13950: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Add a NoopProcedureStore for testing Key: HBASE-13950 URL: https://issues.apache.org/jira/browse/HBASE-13950 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13950-v0-branch-1.patch, HBASE-13950-v1-branch-1.patch, HBASE-13950-v1.patch Add a NoopProcedureStore and an helper in ProcedureTestingUtil to submitAndWait() a procedure without having to do anything else. This is useful to avoid extra code like in case of TestAssignmentManager.processServerShutdownHandler() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14051) Undo workarounds in IntegrationTestDDLMasterFailover for client double submit
[ https://issues.apache.org/jira/browse/HBASE-14051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14051: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Undo workarounds in IntegrationTestDDLMasterFailover for client double submit - Key: HBASE-14051 URL: https://issues.apache.org/jira/browse/HBASE-14051 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Sophia Feng Fix For: 2.0.0, 1.2.0, 1.3.0 Now that nonce support for dealing with double-submits from client in HBASE-13415 is committed, we should undo the workarounds done in HBASE-13470 for this part. We needed these workarounds for 1.1.1, but we should not do that anymore for proper testing for 1.2+. [~syuanjiang], [~fengs], [~mbertozzi] FYI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14107) Administrative Task: Provide an API to List all procedures
[ https://issues.apache.org/jira/browse/HBASE-14107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14107: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Administrative Task: Provide an API to List all procedures --- Key: HBASE-14107 URL: https://issues.apache.org/jira/browse/HBASE-14107 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang With Procedure V2 in production since HBASE 1.1 release, there is a need to list all procedures (running, queued, recently completed) from HBASE shell (or Web UI). This JIRA is to track the work to add a API to list all procedures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14017) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion
[ https://issues.apache.org/jira/browse/HBASE-14017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14017: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - MasterProcedureQueue fix concurrency issue on table queue deletion - Key: HBASE-14017 URL: https://issues.apache.org/jira/browse/HBASE-14017 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2 Attachments: HBASE-14017-v0.patch, HBASE-14017-v0.patch, HBASE-14017.as-pushed-master.patch, HBASE-14017.v1-branch1.1.patch, HBASE-14017.v1-branch1.1.patch [~syuanjiang] found a concurrecy issue in the procedure queue delete where we don't have an exclusive lock before deleting the table {noformat} Thread 1: Create table is running - the queue is empty and wlock is false Thread 2: markTableAsDeleted see the queue empty and wlock= false Thread 1: tryWrite() set wlock=true; too late Thread 2: delete the queue Thread 1: never able to release the lock - NPE when trying to get the queue {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14334: -- Issue Type: Improvement (was: Bug) Move Memcached block cache in to it's own optional module. -- Key: HBASE-14334 URL: https://issues.apache.org/jira/browse/HBASE-14334 Project: HBase Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14334.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720908#comment-14720908 ] Hudson commented on HBASE-14315: FAILURE: Integrated in HBase-1.3 #139 (See [https://builds.apache.org/job/HBase-1.3/139/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev c277166fd1f3104c0db9011f700eebf404b812ad) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720921#comment-14720921 ] Andrew Purtell commented on HBASE-6721: --- Let's not let the perfect be the enemy of the good. A proper multi-layer admission control change isn't on the table, it isn't on anyone's roadmap, it isn't even something proposed on a JIRA and/or in a design document. Even if we have a proposal for HBase this will certainly be considered imperfect and incomplete by some without 100% agreement and a plan at the HDFS level, and getting that is as likely as finding a unicorn wandering around downtown SF. (Ok... maybe a horse dressed to look like a unicorn could be a thing...) Meanwhile we have a patch on deck and we need to be evaluating it and its contributor's concerns on their merit. This is something that one of our esteemed users runs in production, is persistent about getting in and responsive to feedback, and both of those things in my opinion should carry a lot of weight. The same kind of weight that previously proposed changes like HFileV2 or the 0.90 master rewrite (remember that?), or the memcache-based block cache carried, or pending IPv6 related changes. That said, I don't think it's ready to be merged into master. We have it up in a feature branch. Let's continue that, address concerns, make sure it's totally optional for those who don't want it, measure its impact. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720938#comment-14720938 ] Hudson commented on HBASE-14322: FAILURE: Integrated in HBase-1.2 #143 (See [https://builds.apache.org/job/HBase-1.2/143/]) HBASE-14322 Add a master priority function to let master use it's threads (eclark: rev 5f15583d342cd3129648452cdd7a6c686ad7fa81) * hbase-server/src/test/java/org/apache/hadoop/hbase/QosTestHelper.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AnnotationReadingPriorityFunction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterAnnotationReadingPriorityFunction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterQosFunction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQosFunction.java Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322-v7.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13204) Procedure v2 - client create/delete table sync
[ https://issues.apache.org/jira/browse/HBASE-13204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13204: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - client create/delete table sync -- Key: HBASE-13204 URL: https://issues.apache.org/jira/browse/HBASE-13204 Project: HBase Issue Type: Sub-task Components: master Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13204-v0-branch-1.patch, HBASE-13204-v0.patch client side, part of HBASE-12439/HBASE-13203 it uses the new procedure code to be know when the procedure is completed, and have a proper sync behavior on create/delete table. Review: https://reviews.apache.org/r/32391/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13209) Procedure V2 - master Add/Modify/Delete Column Family
[ https://issues.apache.org/jira/browse/HBASE-13209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13209: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - master Add/Modify/Delete Column Family - Key: HBASE-13209 URL: https://issues.apache.org/jira/browse/HBASE-13209 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Labels: reliability Fix For: 2.0.0, 1.1.0 Attachments: AlterColumnFamily-no-gen-file.v1-master.patch, HBASE-13209-v2.patch, HBASE-13209-v3-branch-1.patch, HBASE-13209-v3.patch Original Estimate: 168h Time Spent: 384h Remaining Estimate: 0h master side, part of HBASE-12439 starts up the procedure executor on the master and replaces the add/modify/delete handlers with the procedure version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13210) Procedure V2 - master Modify table
[ https://issues.apache.org/jira/browse/HBASE-13210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13210: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - master Modify table -- Key: HBASE-13210 URL: https://issues.apache.org/jira/browse/HBASE-13210 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Labels: reliablity Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13210-v2.patch, HBASE-13210-v3.patch, ModifyTableProcedure-no-gen-file.v1-master.patch Original Estimate: 72h Time Spent: 168h Remaining Estimate: 0h master side, part of HBASE-12439 starts up the procedure executor on the master and replaces the modify table handlers with the procedure version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13202) Procedure v2 - core framework
[ https://issues.apache.org/jira/browse/HBASE-13202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13202: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - core framework - Key: HBASE-13202 URL: https://issues.apache.org/jira/browse/HBASE-13202 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13202-v0-hbase-12439.patch, HBASE-13202-v1-hbase-12439.patch, HBASE-13202-v2.patch, HBASE-13203-v2-branch_1.patch, ProcedureV2-overview.pdf core package, part of HBASE-12439 this is just the proc-v2 submodule. it depends only on hbase-common. https://reviews.apache.org/r/27703/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13203) Procedure v2 - master create/delete table
[ https://issues.apache.org/jira/browse/HBASE-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13203: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - master create/delete table - Key: HBASE-13203 URL: https://issues.apache.org/jira/browse/HBASE-13203 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13203-v0.patch, HBASE-13203-v1-branch-1.patch, HBASE-13203-v1.patch master side, part of HBASE-12439 starts up the procedure executor on the master and replaces the create/delete table handlers with the procedure version. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13685) Procedure v2 - Add maxProcId to the wal header
[ https://issues.apache.org/jira/browse/HBASE-13685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13685: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - Add maxProcId to the wal header -- Key: HBASE-13685 URL: https://issues.apache.org/jira/browse/HBASE-13685 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 1.1.0 Attachments: HBASE-13685-v0.patch while working on HBASE-13476 I found that having the max-proc-id in the wal header, allows some nice optimizations. [~ndimiduk] since 1.1 is not released yet, can we get this in so we can avoid all extra handling? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13759) Improve procedure yielding
[ https://issues.apache.org/jira/browse/HBASE-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13759: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Improve procedure yielding -- Key: HBASE-13759 URL: https://issues.apache.org/jira/browse/HBASE-13759 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.2.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Trivial Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13759-v0.patch, HBASE-13759-v1.patch Adds the ability to yield the procedure every execution step. by default, a procedure will try to go begin to end without stopping. This allows procedures to be nice to other procedures. one usage example is ServerShutdownHandler where we want everyone to make some progress. Allows procedure to throw InterruptedException, the default handling will be: ask the master if there is an abort of stop. If there is, stop executions and exit. Else, clear the IE and carryon executing. the interruted procedure will retry. If the procedure implementor wants a different behavior, the IE can be catched and custom handling can be performed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13551) Procedure V2 - Procedure classes should not be InterfaceAudience.Public
[ https://issues.apache.org/jira/browse/HBASE-13551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13551: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - Procedure classes should not be InterfaceAudience.Public --- Key: HBASE-13551 URL: https://issues.apache.org/jira/browse/HBASE-13551 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Enis Soztutar Priority: Blocker Fix For: 2.0.0, 1.1.0 Attachments: hbase-13551_v1.patch Just noticed this. We have ProcedureStore, and exceptions and some procedures declared as {{InterfaceAudience.Public}}. We should really make them Private or LimiterPrivate since Public is for user comsumption. Are we exposing Procedures to coprocessors? I do not see a use case for that, so it should be Private IMO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11080) TestZKSecretWatcher#testKeyUpdate occasionally fails
[ https://issues.apache.org/jira/browse/HBASE-11080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-11080. Resolution: Cannot Reproduce TestZKSecretWatcher#testKeyUpdate occasionally fails Key: HBASE-11080 URL: https://issues.apache.org/jira/browse/HBASE-11080 Project: HBase Issue Type: Test Affects Versions: 0.98.1 Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/280/testReport/junit/org.apache.hadoop.hbase.security.token/TestZKSecretWatcher/testKeyUpdate/ : {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertNotNull(Assert.java:621) at org.junit.Assert.assertNotNull(Assert.java:631) at org.apache.hadoop.hbase.security.token.TestZKSecretWatcher.testKeyUpdate(TestZKSecretWatcher.java:221) {code} Here is the assertion that failed: {code} assertNotNull(newMaster); {code} Looks like new master did not come up within 5 tries. One potential fix is to increase the number of attempts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720930#comment-14720930 ] Hudson commented on HBASE-14315: FAILURE: Integrated in HBase-0.98 #1104 (See [https://builds.apache.org/job/HBase-0.98/1104/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev 7a4fa7f20cf8084ce57dcaa83e0c4f430d290736) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13476) Procedure v2 - Add Replay Order logic for child procedures
[ https://issues.apache.org/jira/browse/HBASE-13476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13476: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - Add Replay Order logic for child procedures -- Key: HBASE-13476 URL: https://issues.apache.org/jira/browse/HBASE-13476 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Fix For: 2.0.0, 1.2.0 Attachments: HBASE-13476-v0.patch, HBASE-13476-v1.patch, HBASE-13476-v2.patch The current replay order logic is only for single-level procedures (which is what we are using today for master operations). To complete the implementation for the notification-bus we need to be able to replay in correct order child procs too. this will not impact the the current procs implementation (create/delete/modify/...) it is just a change at the framework level. https://reviews.apache.org/r/34289/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13529) Procedure v2 - WAL Improvements
[ https://issues.apache.org/jira/browse/HBASE-13529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13529: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure v2 - WAL Improvements --- Key: HBASE-13529 URL: https://issues.apache.org/jira/browse/HBASE-13529 Project: HBase Issue Type: Sub-task Components: proc-v2 Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13529-v0.patch, HBASE-13529-v1.patch, HBASE-13529-v2.patch, ProcedureStoreTest.java from the discussion in HBASE-12439 the wal was resulting slow. * there is an error around the awake of the slotCond.await(), causing more wait then necessary * ArrayBlockingQueue is dog slow, replace it with ConcurrentLinkedQueue * roll the wal only if reaches a threshold (conf ops) to amortize the cost * hsync() is used by default, when the normal wal is using just hflush() make it tunable via conf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13470) High level Integration test for master DDL operations
[ https://issues.apache.org/jira/browse/HBASE-13470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13470: --- Parent Issue: HBASE-14336 (was: HBASE-12439) High level Integration test for master DDL operations - Key: HBASE-13470 URL: https://issues.apache.org/jira/browse/HBASE-13470 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Sophia Feng Fix For: 2.0.0, 1.2.0, 1.1.1, 1.3.0 Attachments: HBASE-13470-v0.patch, HBASE-13470-v1.patch, HBASE-13470-v2.patch, HBASE-13470-v3.patch, HBASE-13470-v4.patch, hbase-13740_v5.patch, hbase-13740_v6.patch, hbase-13740_v6.patch Our [~fengs] has an integration test which executes DDL operations with a new monkey to kill the active master as a high level test for the proc v2 changes. The test does random DDL operations from 20 client threads. The DDL statements are create / delete / modify / enable / disable table and CF operations. It runs HBCK to verify the end state. The test can be run on a single master, or multi master setup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14336) Procedure V2 Phase 1 - Procedure Framework and Making DDL Operations fault tolerant
[ https://issues.apache.org/jira/browse/HBASE-14336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-14336: --- Labels: reliability (was: ) Procedure V2 Phase 1 - Procedure Framework and Making DDL Operations fault tolerant --- Key: HBASE-14336 URL: https://issues.apache.org/jira/browse/HBASE-14336 Project: HBase Issue Type: Task Components: proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0, 1.3.0 Reporter: Stephen Yuan Jiang Assignee: Stephen Yuan Jiang Labels: reliability This is the first phase of Procedure V2 (HBASE-12439) - Core framework - re-implement Namespace/Table/Column DDLs to multi-steps procedure with a rollback/rollforward ability in case of failure -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13455) Procedure V2 - master truncate table
[ https://issues.apache.org/jira/browse/HBASE-13455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Yuan Jiang updated HBASE-13455: --- Parent Issue: HBASE-14336 (was: HBASE-12439) Procedure V2 - master truncate table Key: HBASE-13455 URL: https://issues.apache.org/jira/browse/HBASE-13455 Project: HBase Issue Type: Sub-task Components: master Affects Versions: 2.0.0, 1.1.0 Reporter: Matteo Bertozzi Assignee: Matteo Bertozzi Priority: Minor Fix For: 2.0.0, 1.1.0 Attachments: HBASE-13455-v0.patch, HBASE-13455-v1.patch master side, part of HBASE-12439 and replaces the truncate table handlers with the procedure version. https://reviews.apache.org/r/33102 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11159) Some medium tests should be classified as large tests
[ https://issues.apache.org/jira/browse/HBASE-11159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-11159. Resolution: Later Some medium tests should be classified as large tests - Key: HBASE-11159 URL: https://issues.apache.org/jira/browse/HBASE-11159 Project: HBase Issue Type: Test Reporter: Ted Yu Priority: Minor Jeff Bowles made this observation based on Jenkins build. From https://builds.apache.org/job/HBase-TRUNK/5131/consoleFull : {code} Running org.apache.hadoop.hbase.client.TestHCM Tests run: 20, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 197.448 sec Running org.apache.hadoop.hbase.client.TestClientOperationInterrupt Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.34 sec {code} The above tests should be classified as large tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11198) test-patch.sh should handle the case where trunk patch is attached along with patch for 0.98
[ https://issues.apache.org/jira/browse/HBASE-11198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu resolved HBASE-11198. Resolution: Won't Fix test-patch.sh should handle the case where trunk patch is attached along with patch for 0.98 Key: HBASE-11198 URL: https://issues.apache.org/jira/browse/HBASE-11198 Project: HBase Issue Type: Test Reporter: Ted Yu From https://builds.apache.org/job/PreCommit-HBASE-Build/9531//console : {code} -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645375/HBASE-11104_98_v3.patch against trunk revision . ATTACHMENT ID: 12645375 +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. {code} The cause was that patch for 0.98 was slightly newer than trunk patch. test-patch.sh should handle this case by recognizing 'could not apply the patch' error and retrying with trunk patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720970#comment-14720970 ] Hudson commented on HBASE-14322: SUCCESS: Integrated in HBase-1.2-IT #119 (See [https://builds.apache.org/job/HBase-1.2-IT/119/]) HBASE-14322 Add a master priority function to let master use it's threads (eclark: rev 5f15583d342cd3129648452cdd7a6c686ad7fa81) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQosFunction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterAnnotationReadingPriorityFunction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AnnotationReadingPriorityFunction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/QosTestHelper.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterQosFunction.java Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322-v7.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720888#comment-14720888 ] Hudson commented on HBASE-14315: SUCCESS: Integrated in HBase-1.3-IT #121 (See [https://builds.apache.org/job/HBase-1.3-IT/121/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev c277166fd1f3104c0db9011f700eebf404b812ad) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720915#comment-14720915 ] Hadoop QA commented on HBASE-14334: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753087/HBASE-14334.patch against master branch at commit cf4c0fb71ccb8b15549b2410083434398aa0ebb3. ATTACHMENT ID: 12753087 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; + outputFile${project.build.directory}/test-classes/mrapp-generated-classpath/outputFile + outputFile${project.build.directory}/test-classes/mrapp-generated-classpath/outputFile {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting {color:red}-1 core zombie tests{color}. There are 10 zombie test(s): at org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:432) at org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer.testRegionReplicationOnMidClusterSameHosts(TestStochasticLoadBalancer.java:454) at org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:432) at org.apache.hadoop.hbase.master.balancer.BalancerTestBase.testWithCluster(BalancerTestBase.java:422) at org.apache.hadoop.hbase.master.balancer.TestStochasticLoadBalancer2.testRegionReplicasOnMidClusterHighReplication(TestStochasticLoadBalancer2.java:73) at org.apache.hadoop.hbase.security.access.TestAccessController2.testACLTableAccess(TestAccessController2.java:273) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15320//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15320//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15320//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15320//console This message is automatically generated. Move Memcached block cache in to it's own optional module. -- Key: HBASE-14334 URL: https://issues.apache.org/jira/browse/HBASE-14334 Project: HBase Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14334.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720920#comment-14720920 ] Hudson commented on HBASE-14315: FAILURE: Integrated in HBase-TRUNK #6763 (See [https://builds.apache.org/job/HBase-TRUNK/6763/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev cf4c0fb71ccb8b15549b2410083434398aa0ebb3) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720932#comment-14720932 ] Hadoop QA commented on HBASE-14322: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753101/HBASE-14322-v7.patch against master branch at commit cf4c0fb71ccb8b15549b2410083434398aa0ebb3. ATTACHMENT ID: 12753101 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testRefreshStoreFiles(TestRegionReplicas.java:241) at org.apache.hadoop.hbase.regionserver.TestCorruptedRegionStoreFile.testLosingFileAfterScannerInit(TestCorruptedRegionStoreFile.java:173) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15321//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15321//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15321//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15321//console This message is automatically generated. Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322-v7.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720939#comment-14720939 ] Hudson commented on HBASE-14322: FAILURE: Integrated in HBase-TRUNK #6764 (See [https://builds.apache.org/job/HBase-TRUNK/6764/]) HBASE-14322 Add a master priority function to let master use it's threads (eclark: rev 0d06d8ddd0aa9aff7476fb6a7acd6af1d24ba3fc) * hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestQosFunction.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterAnnotationReadingPriorityFunction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterPriorityRpc.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSRpcServices.java * hbase-server/src/test/java/org/apache/hadoop/hbase/QosTestHelper.java * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/AnnotationReadingPriorityFunction.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterQosFunction.java Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322-v7.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720940#comment-14720940 ] stack commented on HBASE-14322: --- bq. Or am I mis-understanding what you're saying? My bad. My misreading of patch. Ignore. Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322-v7.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14258) Make region_mover.rb script case insensitive with regard to hostname
[ https://issues.apache.org/jira/browse/HBASE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720955#comment-14720955 ] Ted Yu commented on HBASE-14258: @Vlad: If you agree with the formation in addendum 2, I can commit it. Make region_mover.rb script case insensitive with regard to hostname Key: HBASE-14258 URL: https://issues.apache.org/jira/browse/HBASE-14258 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 Attachments: 14258-addendum.2, HBASE-14258.patch, HBASE-14258.patch.add The script is case sensitive and fails when case of a host name being unloaded does not match with a case of a region server name returned by HBase API. This doc clarifies IETF rules on case insensitivities in DNS: https://www.ietf.org/rfc/rfc4343.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14258) Make region_mover.rb script case insensitive with regard to hostname
[ https://issues.apache.org/jira/browse/HBASE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14258: --- Attachment: 14258-addendum.2 Make region_mover.rb script case insensitive with regard to hostname Key: HBASE-14258 URL: https://issues.apache.org/jira/browse/HBASE-14258 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 Attachments: 14258-addendum.2, HBASE-14258.patch, HBASE-14258.patch.add The script is case sensitive and fails when case of a host name being unloaded does not match with a case of a region server name returned by HBase API. This doc clarifies IETF rules on case insensitivities in DNS: https://www.ietf.org/rfc/rfc4343.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720115#comment-14720115 ] Francis Liu commented on HBASE-6721: Here's a new RB request: https://reviews.apache.org/r/27673/ RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720145#comment-14720145 ] Hadoop QA commented on HBASE-6721: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753020/HBASE-6721_12.patch against master branch at commit cc1542828de93b8d54cc14497fd5937989ea1b6d. ATTACHMENT ID: 12753020 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 33 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15317//console This message is automatically generated. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720239#comment-14720239 ] Devaraj Das commented on HBASE-14309: - I think it's good to ensure that meta is not in transition when we attempt to initiate the balancer. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720069#comment-14720069 ] Francis Liu commented on HBASE-6721: Sounds good. I don't have any other changes pending so I'm going to update RB with the new patch on trunk. [~apurtell] thanks for taking care of the backport to 0.98 for the patches. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14327) TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky
[ https://issues.apache.org/jira/browse/HBASE-14327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720087#comment-14720087 ] stack commented on HBASE-14327: --- The archiver is running in between close and open of the region in the new location? Is the file removed the result of the compaction or is it one of the inputs on the original region? The archive thinks it is safe to remove though the compaction has not 'completed' yet? That sounds like another issue. In this case, if the archiver is moving the file aside, before the new region open has a chance to work on it, yeah, delay or stop the archiver in the test. Thank you for digging in on this [~chenheng] TestIOFencing#testFencingAroundCompactionAfterWALSync is flaky -- Key: HBASE-14327 URL: https://issues.apache.org/jira/browse/HBASE-14327 Project: HBase Issue Type: Bug Components: test Reporter: Dima Spivak Priority: Critical I'm looking into some more of the flaky tests on trunk and this one seems to be particularly gross, failing about half the time in recent days. Some probably-relevant output from [a recent run|https://builds.apache.org/job/HBase-TRUNK/6761/testReport/org.apache.hadoop.hbase/TestIOFencing/testFencingAroundCompactionAfterWALSync/]: {noformat} 2015-08-27 18:50:14,318 INFO [main] hbase.TestIOFencing(326): Allowing compaction to proceed 2015-08-27 18:50:14,318 DEBUG [main] hbase.TestIOFencing$CompactionBlockerRegion(110): allowing compactions 2015-08-27 18:50:14,318 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,323 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1732): Removing store files after compaction... 2015-08-27 18:50:14,330 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,331 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(224): Archiving compacted store files. 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/99e903ad7e0f4029862d0e35c5548464 2015-08-27 18:50:14,337 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/74a80cc06d134361941085bc2bb905fe 2015-08-27 18:50:14,341 DEBUG [RS:0;hemera:35619-longCompactions-1440701391112] backup.HFileArchiver(438): Finished archiving from class org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc, to hdfs://localhost:34675/user/jenkins/test-data/19edea13-027b-4c6a-9f3f-edaf1fc590ab/archive/data/default/tabletest/94d6f21f7cf387d73d8622f535c67311/family/7067addd325446089ba15ec2c77becbc 2015-08-27 18:50:14,342 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.HStore(1353): Completed compaction of 2 (all) file(s) in family of tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311. into e138bb0ec6c64ad19efab3b44dbbcb1a(size=68.7 K), total size for store is 146.9 K. This selection was in queue for 0sec, and took 10sec to execute. 2015-08-27 18:50:14,343 INFO [RS:0;hemera:35619-longCompactions-1440701391112] regionserver.CompactSplitThread$CompactionRunner(527): Completed compaction: Request = regionName=tabletest,,1440701396419.94d6f21f7cf387d73d8622f535c67311., storeName=family, fileCount=2, fileSize=73.1 K, priority=998, time=525052314434020; duration=10sec 2015-08-27 18:50:14,343 DEBUG [RS:0;hemera:35619-shortCompactions-1440701403303] backup.HFileArchiver(438): Finished archiving from class
[jira] [Updated] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14309: --- Attachment: 14309-v7.txt Patch v7 refines the WARN message in balancer.rb When hbase:meta is in transition, ignore force flag for balancer command. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt, 14309-v7.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14309: --- Description: This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. was: This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720293#comment-14720293 ] Jerry He commented on HBASE-14309: -- +1 on the WARNING and guard on meta. force is commonly always last resort :-) {code} -LOG.debug(Not running balancer because + regionsInTransition.size() + +// if hbase:meta region is in transition, result of assignment cannot be recorded +// ignore the force flag in that case +String prefix = force !assignmentManager.getRegionStates().isMetaRegionInTransition() ? +R : Not r; +LOG.debug(prefix + unning balancer because + regionsInTransition.size() + region(s) in transition: + org.apache.commons.lang.StringUtils. abbreviate(regionsInTransition.toString(), 256)); -return false; +if (!force) return false; {code} Should return false if isMetaRegionInTransition is true. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt, 14309-v7.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francis Liu updated HBASE-6721: --- Attachment: HBASE-6721_12.patch RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720119#comment-14720119 ] Francis Liu commented on HBASE-6721: Sorry it's not a new RB request I updated the old one. Was thinking wether I should just create a new RB request. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720203#comment-14720203 ] stack commented on HBASE-14309: --- bq. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was nothing to this effect in patches until v6. Now it has below. 29 WARNING: Have you run hbck, etc, to determine the cause for region stuck in transition 30before using the force flag ? 31 Examples: Should be more clear that it can do damage and make less reference to 'hbck', 'etc.', and 'rit'. 'For experts only. Forcing a balance may do more damage than repair when assignment is confused.' Would be good to then link to a section in refguide on plus/minus/implications. Agree it is good to expose tools to help in extreme. Dangerous options that may do more damage than good need proper couching with warning including justification for why we need this option when you open the issue. The original, la-de-dah text is: This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. which comes across as you have nothing better to do all day but add options on commands when in fact, you have good cause. bq. In patch v6, added guard against system table region in transition even if force parameter carries value of true. What is this about? Why if operator thinks a force is needed, internally, we pass on it because its a system table in transition... Seems arbitrary. You provide no reasoning for the exclusion. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720221#comment-14720221 ] Ted Yu commented on HBASE-14309: I can adopt the suggested warning in balancer.rb in the next patch. w.r.t. the guard against system table region in transition, the reasoning is that if, e.g. hbase:meta, is in transition, the results of reassignments can't be written. I can narrow the condition to checking hbase:meta only if you think that is proper. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14309: --- Release Note: This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region (other than hbase:meta) in transition - assuming RIT being transient. WARNING: For experts only. Forcing a balance may do more damage than repair when assignment is confused Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt, 14309-v7.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720273#comment-14720273 ] Elliott Clark commented on HBASE-6721: -- I'm still officially -1 as long as this is built into the core. 99.99%(assuming 10k hbase users) of HBase users should never ever run something like this. It will make an already very operationally complex system un-workable. Because of that anything that's in the default code, adds to the default admin, and is built in is something I can't see ever being ok with. All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature. We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same. So I'm -1 on anything that I can't completely remove. RM -RF. * Assignment manager is already too complex adding more complexity is awful * region movement is already too stateful. Adding more is awful * configuration of HBase is already way too complex. Multiplying that with multiple groups is awful. * Admin already has way too many things for users to do that cause issue. Adding more ways for a cluster to be borked is awful. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14309: --- Attachment: 14309-v7.txt Nice catch, Jerry. Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt, 14309-v7.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720383#comment-14720383 ] Mikhail Antonov commented on HBASE-14322: - Sorry for delay, thanks for clarification, I see now - basically the piece to handle region transition reports about system tables was there before, but misplaced in AnnotationReadingPriorityFunction so when requests are coming from admin user, we never get this code path. Let me take a look a bit more. Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720395#comment-14720395 ] Francis Liu commented on HBASE-6721: {quote} All of this has already been tried at FB and it was a mistake. This ends up looking functionally very similar to 0.89-fb's favored nodes. ( Only assign regions to specific set of machines that's configured by the admin ). It's so bad that almost every time we try and solve an issue on a cluster with favored nodes, the first thing we do is turn off the balancer so that we don't have to worry about which nodes are configured to have with regions. That's literally step one of debugging. Turn off this feature. We'll have a party when FB no longer has this operational nightmare. I won't sign anyone up for the same. I won't sign myself up for the same. {quote} IMHO that's not an objective comparison. Favored Nodes and Region Server groups are very different. Their use cases are very different and their implementations are also very different. As for how useful it is for us (and potenitally for others), if we actually removed region server groups I'm pretty sure our HBase team and SEs would revolt :-). If we didn't have this feature we would be managing around 80 hbase clusters right now instead of the 6 multi-tenant cluster we are currently running. Step one of debugging is not turning of the balancer that would make things worse. In fact one of the useful features of region server groups is quickly isolating tables to a new group if they are misbehaving or their workload has changed. This can be done in a few minutes if not seconds. {quote} Assignment manager is already too complex adding more complexity is awful {quote} If you look at the patch, the change in AM is an extra 20 lines of code. 6 lines are just bugfixes that should be done to AM anyway and the other 14 lines which is fairly straightforward we can even live without if that's what it takes. {quote} region movement is already too stateful. Adding more is awful {quote} Adding more states? {quote} Configuration of HBase is already way too complex. Multiplying that with multiple groups is awful. {quote} Not sure what the concern here is? That there's an option to configure a different balancer? RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720408#comment-14720408 ] Mikhail Antonov commented on HBASE-14322: - Looks good to me, +1. Few nits - QosTestHelper needs audience annotation and some class javadoc? Also unused import in newly added test; I'd consider adding you first comment (Basically reporting that a region is in transition needs to access meta. If reporting that meta is online is stuck behind requests trying to access meta, then everything times out and fails.) to javadoc somethere in MAPRF? Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14322) Master still not using more than it's priority threads
[ https://issues.apache.org/jira/browse/HBASE-14322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720323#comment-14720323 ] Elliott Clark commented on HBASE-14322: --- Ping? This is running in production for us. It made it possible to restart the master while doing a rolling deploy (even if things are still weird they work with this patch). Master still not using more than it's priority threads -- Key: HBASE-14322 URL: https://issues.apache.org/jira/browse/HBASE-14322 Project: HBase Issue Type: Sub-task Components: master, rpc Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 1.2.0 Attachments: HBASE-14322-v1.patch, HBASE-14322-v2.patch, HBASE-14322-v3-branch-1.patch, HBASE-14322-v3.patch, HBASE-14322-v4-branch-1.patch, HBASE-14322-v5-branch-1.patch, HBASE-14322-v6.patch, HBASE-14322.patch Master and regionserver will be running as the same user. Superusers by default adds the current user as a super user. Super users' requests always go to the priority threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14309) Allow load balancer to operate when there is region in transition by adding force flag
[ https://issues.apache.org/jira/browse/HBASE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-14309: --- Attachment: (was: 14309-v7.txt) Allow load balancer to operate when there is region in transition by adding force flag -- Key: HBASE-14309 URL: https://issues.apache.org/jira/browse/HBASE-14309 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.3.0 Attachments: 14309-branch-1.1.txt, 14309-v1.txt, 14309-v2.txt, 14309-v3.txt, 14309-v4.txt, 14309-v5-branch-1.txt, 14309-v5.txt, 14309-v5.txt, 14309-v6.txt This issue adds boolean parameter, force, to 'balancer' command so that admin can force region balancing even when there is region in transition - assuming RIT being transient. This enhancement was requested by some customer. The assumption of this change is that the operator has run hbck and has a reasonable idea why regions are stuck in transition before using the force flag. There was a recent event at the customer where a cluster ended up with a small number of regionservers hosting most of the regions on the cluster (one regionserver had 50% of the roughly 20,000 regions). The balancer couldn't be run due to the small number of regions that were stuck in transition. The admin ended up killing the regionservers so that reassignment would yield a more equitable distribution of the regions. On a different cluster, there was a single store file that had corrupt HDFS blocks (the SSDs on the cluster were known to lose data). However, since this single region (out of 10s of 1000s of regions on this cluster) was stuck in transition, the balancer couldn't run. While the state keeping in HBase isn't so good yet that the admin can kick off the balancer automatically in such scenarios knowing when it is safe to do so and when it is not, having this option available for the operator to use as he / she sees fit seems prudent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720350#comment-14720350 ] Hudson commented on HBASE-13965: FAILURE: Integrated in HBase-0.98 #1103 (See [https://builds.apache.org/job/HBase-0.98/1103/]) HBASE-14289 Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 (tedyu: rev 7a0d36fecfebb1f61811181347f90a9b88a9d09b) * hbase-hadoop1-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Fix For: 2.0.0, 1.3.0 Attachments: 13965-addendum.txt, HBASE-13965-branch-1-v2.patch, HBASE-13965-branch-1.patch, HBASE-13965-v10.patch, HBASE-13965-v11.patch, HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965-v7.patch, HBASE-13965-v8.patch, HBASE-13965-v9.patch, HBASE-13965_v2.patch, HBase-13965-JConsole.png, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14289) Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720351#comment-14720351 ] Hudson commented on HBASE-14289: FAILURE: Integrated in HBase-0.98 #1103 (See [https://builds.apache.org/job/HBase-0.98/1103/]) HBASE-14289 Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 (tedyu: rev 7a0d36fecfebb1f61811181347f90a9b88a9d09b) * hbase-hadoop1-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/FavoredNodeLoadBalancer.java * hbase-hadoop-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSource.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancer.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/balancer/TestBaseLoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/BaseLoadBalancer.java * hbase-hadoop1-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/test/java/org/apache/hadoop/hbase/TestStochasticBalancerJmxMetrics.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java * hbase-hadoop2-compat/src/main/java/org/apache/hadoop/hbase/master/balancer/MetricsStochasticBalancerSourceImpl.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/StochasticLoadBalancer.java * hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java * hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java * hbase-hadoop2-compat/src/main/resources/META-INF/services/org.apache.hadoop.hbase.master.balancer.MetricsStochasticBalancerSource * hbase-server/src/main/java/org/apache/hadoop/hbase/master/balancer/SimpleLoadBalancer.java Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 --- Key: HBASE-14289 URL: https://issues.apache.org/jira/browse/HBASE-14289 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.15 Attachments: 14289-0.98-v2.txt, 14289-0.98-v3.txt, 14289-0.98-v4.txt, 14289-0.98-v5.txt The default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. This issue backports HBASE-13965 to 0.98 branch to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-14289) Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu reopened HBASE-14289: Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 --- Key: HBASE-14289 URL: https://issues.apache.org/jira/browse/HBASE-14289 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.15 Attachments: 14289-0.98-v2.txt, 14289-0.98-v3.txt, 14289-0.98-v4.txt, 14289-0.98-v5.txt The default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. This issue backports HBASE-13965 to 0.98 branch to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6721) RegionServer Group based Assignment
[ https://issues.apache.org/jira/browse/HBASE-6721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720745#comment-14720745 ] Nick Dimiduk commented on HBASE-6721: - {quote} bq.This is not only a multi-tenancy solution but an isolation solution. Isolation is the worse solution for getting multi-tenancy. {quote} Maybe you can elaborate on this point? Seems we need to quarantine users from each other, whether that's physically as per this patch or via imposed resource controls within a single process. Either way, quarantine is the same as isolation; we're isolating users from each other to achieve fairness of service delivery in a multi-tenant environment. RegionServer Group based Assignment --- Key: HBASE-6721 URL: https://issues.apache.org/jira/browse/HBASE-6721 Project: HBase Issue Type: New Feature Reporter: Francis Liu Assignee: Francis Liu Labels: hbase-6721 Attachments: 6721-master-webUI.patch, HBASE-6721 GroupBasedLoadBalancer Sequence Diagram.xml, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721-DesigDoc.pdf, HBASE-6721_0.98_2.patch, HBASE-6721_10.patch, HBASE-6721_11.patch, HBASE-6721_12.patch, HBASE-6721_8.patch, HBASE-6721_9.patch, HBASE-6721_9.patch, HBASE-6721_94.patch, HBASE-6721_94.patch, HBASE-6721_94_2.patch, HBASE-6721_94_3.patch, HBASE-6721_94_3.patch, HBASE-6721_94_4.patch, HBASE-6721_94_5.patch, HBASE-6721_94_6.patch, HBASE-6721_94_7.patch, HBASE-6721_98_1.patch, HBASE-6721_98_2.patch, HBASE-6721_hbase-6721_addendum.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk.patch, HBASE-6721_trunk1.patch, HBASE-6721_trunk2.patch, balanceCluster Sequence Diagram.svg, immediateAssignments Sequence Diagram.svg, randomAssignment Sequence Diagram.svg, retainAssignment Sequence Diagram.svg, roundRobinAssignment Sequence Diagram.svg In multi-tenant deployments of HBase, it is likely that a RegionServer will be serving out regions from a number of different tables owned by various client applications. Being able to group a subset of running RegionServers and assign specific tables to it, provides a client application a level of isolation and resource allocation. The proposal essentially is to have an AssignmentManager which is aware of RegionServer groups and assigns tables to region servers based on groupings. Load balancing will occur on a per group basis as well. This is essentially a simplification of the approach taken in HBASE-4120. See attached document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14289) Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98
[ https://issues.apache.org/jira/browse/HBASE-14289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720751#comment-14720751 ] Ted Yu commented on HBASE-14289: Reverted. Let me know the branch of Phoenix you used for compilation. Backport HBASE-13965 'Stochastic Load Balancer JMX Metrics' to 0.98 --- Key: HBASE-14289 URL: https://issues.apache.org/jira/browse/HBASE-14289 Project: HBase Issue Type: Improvement Reporter: Ted Yu Assignee: Ted Yu Fix For: 0.98.15 Attachments: 14289-0.98-v2.txt, 14289-0.98-v3.txt, 14289-0.98-v4.txt, 14289-0.98-v5.txt The default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. This issue backports HBASE-13965 to 0.98 branch to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14315) Save one call to KeyValueHeap.peek per row
[ https://issues.apache.org/jira/browse/HBASE-14315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720758#comment-14720758 ] Hudson commented on HBASE-14315: FAILURE: Integrated in HBase-1.1 #641 (See [https://builds.apache.org/job/HBase-1.1/641/]) HBASE-14315 Save one call to KeyValueHeap.peek per row. (larsh: rev 74c5ee41ed10971b8d7cc515695d3ca07faff13a) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java Save one call to KeyValueHeap.peek per row -- Key: HBASE-14315 URL: https://issues.apache.org/jira/browse/HBASE-14315 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 2.0.0, 1.1.2, 1.3.0, 0.98.15, 1.2.1, 1.0.3 Attachments: 14315-0.98.txt, 14315-master.txt Another one of my micro optimizations. In StoreScanner.next(...) we can actually save a call to KeyValueHeap.peek, which in my runs of scan heavy loads shows up at top. Based on the run and data this can safe between 3 and 10% of runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14261) Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections.
[ https://issues.apache.org/jira/browse/HBASE-14261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720761#comment-14720761 ] Enis Soztutar commented on HBASE-14261: --- Thanks Srikanth. The patch looks almost there. Some more comments: bq. Currently, zk nodes are being read using ZKServerTool and the actions get triggered on those nodes pointed to by the output. Not sure what exactly was being pointed to here. Sorry I missed this part in the first patch. This still prints to the System.out and creates a new Configuration. We want to pass the conf and only print out to system.out when ZkServerTool.main() is called. {code} + public static ServerName[] readZKNodes() { {code} This should be ZOOKEEPER_SERVER since it is not a daemon of HBase. {code} HBASE_ZOOKEEPER(zookeeper) {code} Is there a way to ask NN the list of datanodes instead of reading {{slaves}} file? In my deployment, the slaves files is under {{$HADOOP_CONF_DIR}}. {code} + for (String line: FileUtils.readLines(new File(hadoopHome + /etc/hadoop/slaves))) { {code} This should default to hbase rather than root: {code} + return conf.get(hbase.it.clustermanager.hbase.user, root); {code} In deployments I have seen, hdfs daemons and mapred daemons are run as different users (hdfs and mapred typically). We are only interested in killing hdfs datanodes for now. So maybe {{hbase.it.clustermanager.hadoop.hdfs.user}}? {code} + return conf.get(hbase.it.clustermanager.hadoop.user, hadoop); {code} This does not belong in RemoteShell IMO. Passing the user directly is better. {code} +private String getServiceUser() { {code} Better to sleep 100ms rather than 1sec. {code} + Threads.sleep(1000); {code} Some lines are using 4 instead of 2 spaces as indentation. Enhance Chaos Monkey framework by adding zookeeper and datanode fault injections. - Key: HBASE-14261 URL: https://issues.apache.org/jira/browse/HBASE-14261 Project: HBase Issue Type: Improvement Reporter: Srikanth Srungarapu Assignee: Srikanth Srungarapu Attachments: HBASE-14261-branch-1.patch, HBASE-14261.branch-1_v2.patch One of the shortcomings of existing ChaosMonkey framework is lack of fault injections for hbase dependencies like zookeeper, hdfs etc. This patch attempts to solve this problem partially by adding datanode and zk node fault injections. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14334: -- Attachment: HBASE-14334.patch Just a move and refer to classes by name so there's no compile time dependency. Move Memcached block cache in to it's own optional module. -- Key: HBASE-14334 URL: https://issues.apache.org/jira/browse/HBASE-14334 Project: HBase Issue Type: Bug Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HBASE-14334.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14334) Move Memcached block cache in to it's own optional module.
[ https://issues.apache.org/jira/browse/HBASE-14334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14334: -- Fix Version/s: 1.2.0 2.0.0 Affects Version/s: 1.2.0 Status: Patch Available (was: Open) Move Memcached block cache in to it's own optional module. -- Key: HBASE-14334 URL: https://issues.apache.org/jira/browse/HBASE-14334 Project: HBase Issue Type: Bug Affects Versions: 1.2.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14334.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14258) Make region_mover.rb script case insensitive with regard to hostname
[ https://issues.apache.org/jira/browse/HBASE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14258: -- Attachment: HBASE-14258.patch.add Addendum fixes case sensitivity in 'load' Make region_mover.rb script case insensitive with regard to hostname Key: HBASE-14258 URL: https://issues.apache.org/jira/browse/HBASE-14258 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 Attachments: HBASE-14258.patch, HBASE-14258.patch.add The script is case sensitive and fails when case of a host name being unloaded does not match with a case of a region server name returned by HBase API. This doc clarifies IETF rules on case insensitivities in DNS: https://www.ietf.org/rfc/rfc4343.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14258) Make region_mover.rb script case insensitive with regard to hostname
[ https://issues.apache.org/jira/browse/HBASE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720779#comment-14720779 ] Vladimir Rodionov commented on HBASE-14258: --- [~te...@apache.org], addendum to the original patch that fixes case sensitivity in 'load' method. Make region_mover.rb script case insensitive with regard to hostname Key: HBASE-14258 URL: https://issues.apache.org/jira/browse/HBASE-14258 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 Attachments: HBASE-14258.patch, HBASE-14258.patch.add The script is case sensitive and fails when case of a host name being unloaded does not match with a case of a region server name returned by HBase API. This doc clarifies IETF rules on case insensitivities in DNS: https://www.ietf.org/rfc/rfc4343.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14258) Make region_mover.rb script case insensitive with regard to hostname
[ https://issues.apache.org/jira/browse/HBASE-14258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14720784#comment-14720784 ] Ted Yu commented on HBASE-14258: {code} +if hostFromServerName == hostname.upcase and portFromServerName == port {code} Uppcasing hostname can be done outside the loop, right ? Make region_mover.rb script case insensitive with regard to hostname Key: HBASE-14258 URL: https://issues.apache.org/jira/browse/HBASE-14258 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Priority: Minor Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3 Attachments: HBASE-14258.patch, HBASE-14258.patch.add The script is case sensitive and fails when case of a host name being unloaded does not match with a case of a region server name returned by HBase API. This doc clarifies IETF rules on case insensitivities in DNS: https://www.ietf.org/rfc/rfc4343.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)