[jira] [Commented] (HBASE-11811) Use binary search for seeking into a block
[ https://issues.apache.org/jira/browse/HBASE-11811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604001#comment-14604001 ] Lars Hofhansl commented on HBASE-11811: --- Sure, go ahead. Use binary search for seeking into a block -- Key: HBASE-11811 URL: https://issues.apache.org/jira/browse/HBASE-11811 Project: HBase Issue Type: Brainstorming Reporter: Lars Hofhansl Assignee: Vladimir Rodionov Attachments: 11811-wip-v2.txt, 11811-wip-v4.txt, block_index-v2.txt Currently upon every seek (including Gets) we need to linearly look through the block from the beginning until we find the Cell we are looking for. It should be possible to build a simple cache of offsets of Cells for each block as it is loaded and then use binary search to find the Cell in question. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13639) SyncTable - rsync for HBase tables
[ https://issues.apache.org/jira/browse/HBASE-13639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604045#comment-14604045 ] Hudson commented on HBASE-13639: FAILURE: Integrated in HBase-0.98 #1042 (See [https://builds.apache.org/job/HBase-0.98/1042/]) Amend HBASE-13639 SyncTable - rsync for HBase tables (apurtell: rev df7ac74745ab881800d01d48a3a7f05c6a7992f4) * hbase-hadoop1-compat/src/main/java/org/apache/hadoop/mapreduce/lib/output/MapFileOutputFormat.java * hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHashTable.java * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/HashTable.java SyncTable - rsync for HBase tables -- Key: HBASE-13639 URL: https://issues.apache.org/jira/browse/HBASE-13639 Project: HBase Issue Type: New Feature Reporter: Dave Latham Assignee: Dave Latham Fix For: 2.0.0, 0.98.14, 1.2.0 Attachments: HBASE-13639-0.98-addendum-hadoop-1.patch, HBASE-13639-0.98.patch, HBASE-13639-v1.patch, HBASE-13639-v2.patch, HBASE-13639-v3-0.98.patch, HBASE-13639-v3.patch, HBASE-13639.patch Given HBase tables in remote clusters with similar but not identical data, efficiently update a target table such that the data in question is identical to a source table. Efficiency in this context means using far less network traffic than would be required to ship all the data from one cluster to the other. Takes inspiration from rsync. Design doc: https://docs.google.com/document/d/1-2c9kJEWNrXf5V4q_wBcoIXfdchN7Pxvxv1IO6PW0-U/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots
[ https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604044#comment-14604044 ] Hudson commented on HBASE-13356: FAILURE: Integrated in HBase-0.98 #1042 (See [https://builds.apache.org/job/HBase-0.98/1042/]) Amend HBASE-13356 HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots (Andrew Mains) (apurtell: rev cfb4827326b6743cb732b92580152bcf46647b2c) * hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots -- Key: HBASE-13356 URL: https://issues.apache.org/jira/browse/HBASE-13356 Project: HBase Issue Type: New Feature Components: mapreduce Reporter: Andrew Mains Assignee: Andrew Mains Priority: Minor Fix For: 2.0.0, 0.98.14, 1.2.0 Attachments: HBASE-13356-0.98-addendum-hadoop-1.patch, HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch Currently, HBase supports the pushing of multiple scans to mapreduce jobs over live tables (via MultiTableInputFormat) but only supports a single scan for mapreduce jobs over table snapshots. It would be handy to support multiple scans over snapshots as well, probably through another input format (MultiTableSnapshotInputFormat?). To mimic the functionality present in MultiTableInputFormat, the new input format would likely have to take in the names of all snapshots used in addition to the scans. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13959) Region splitting takes too long because it uses a single thread in most common cases
[ https://issues.apache.org/jira/browse/HBASE-13959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603999#comment-14603999 ] Lars Hofhansl commented on HBASE-13959: --- Heh... See 13959-suggest.txt that I attached with same comment. :) Basically your patch, but defaults the max to the max # of storefiles. So in typically setups one does not have to worry about this setting. Region splitting takes too long because it uses a single thread in most common cases Key: HBASE-13959 URL: https://issues.apache.org/jira/browse/HBASE-13959 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.98.12 Reporter: Hari Krishna Dara Assignee: Hari Krishna Dara Priority: Critical Fix For: 0.98.14 Attachments: 13959-suggest.txt, HBASE-13959-2.patch, HBASE-13959-3.patch, HBASE-13959-4.patch, HBASE-13959.patch, region-split-durations-compared.png When storefiles need to be split as part of a region split, the current logic uses a threadpool with the size set to the size of the number of stores. Since most common table setup involves only a single column family, this translates to having a single store and so the threadpool is run with a single thread. However, in a write heavy workload, there could be several tens of storefiles in a store at the time of splitting, and with a threadpool size of one, these files end up getting split sequentially. With a bit of tracing, I noticed that it takes on an average of 350ms to create a single reference file, and splitting each storefile involves creating two of these, so with a storefile count of 20, it takes about 14s just to get through this phase alone (2 reference files for each storefile), pushing the total time the region is offline to 18s or more. For environments that are setup to fail fast, this makes the client exhaust all retries and fail with NotServingRegionException. The fix should increase the concurrency of this operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13814) AssignmentManager does not write the correct server name into Zookeeper when unassign region
[ https://issues.apache.org/jira/browse/HBASE-13814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604003#comment-14604003 ] Lars Hofhansl commented on HBASE-13814: --- +1 on v2. AssignmentManager does not write the correct server name into Zookeeper when unassign region Key: HBASE-13814 URL: https://issues.apache.org/jira/browse/HBASE-13814 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 0.94.27 Reporter: cuijianwei Priority: Minor Attachments: HBASE-13814-0.94-v1.patch, HBASE-13814-0.94-v2.patch When moving region, the region will firstly be unassigned from corresponding region server by the method AssignmentManager#unassign(). AssignmentManager will write the region info and the server name into Zookeeper by the following code: {code} versionOfClosingNode = ZKAssign.createNodeClosing( master.getZooKeeper(), region, master.getServerName()); {code} It seems that the AssignmentManager misuses the master's name as the server name. If the ROOT region is being moved and the region server holding the ROOT region is just crashed. The Master will try to start a MetaServerShutdownHandler if the server is judged as holding meta region. The judgment will be done by the method AssignmentManager#isCarryingRegion, and the method will firstly check the server name in Zookeeper: {code} ServerName addressFromZK = (data != null data.getOrigin() != null) ? data.getOrigin() : null; if (addressFromZK != null) { // if we get something from ZK, we will use the data boolean matchZK = (addressFromZK != null addressFromZK.equals(serverName)); {code} The wrong server name from Zookeeper will make the server not be judged as holding the ROOT region. Then, the master will start a ServerShutdownHandler. Unlike MetaServerShutdownHandler, the ServerShutdownHandler won't assign ROOT region firstly, making the ROOT region won't be assigned forever. In our test environment, we encounter this problem when moving ROOT region and stopping the region server concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604273#comment-14604273 ] Hadoop QA commented on HBASE-8642: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12742324/HBASE-8642-v2.patch against master branch at commit 7dbb2e69776bae8c2f2781f36528c0e784f93a06. ATTACHMENT ID: 12742324 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +puts No snapshots matched the table name regular expression #{tableNameregex.to_s} and the snapshot name regular expression #{snapshotNameRegex.to_s} if count == 0 +puts #{successfullyDeleted} snapshots successfully deleted. unless successfullyDeleted == 0 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14593//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14593//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14593//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14593//console This message is automatically generated. [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604189#comment-14604189 ] Ashish Singhi commented on HBASE-8642: -- Patch addressing Matteo's concern. Please review. [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Singhi updated HBASE-8642: - Attachment: HBASE-8642-v2.patch [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604229#comment-14604229 ] Ted Yu commented on HBASE-13964: I see. Let's wait till we hear some feedback from users who enable namespace quota on how normalization should be done. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13964) Skip region normalization for tables under namespace quota
[ https://issues.apache.org/jira/browse/HBASE-13964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604417#comment-14604417 ] Ted Yu commented on HBASE-13964: It seems metadata associated with split / merge requests would allow server side to distinguish between the ones initiated by normalizer vs. the ones triggered through other means. As long as the net effect of splitting / merging initiated by normalizer doesn't increase the number of regions, normalization should be allowed when namespace quota is in effect. Skip region normalization for tables under namespace quota -- Key: HBASE-13964 URL: https://issues.apache.org/jira/browse/HBASE-13964 Project: HBase Issue Type: Task Components: Balancer, Usability Reporter: Mikhail Antonov Assignee: Ted Yu Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: 13964-branch-1-v2.txt, 13964-branch-1-v3.txt, 13964-v1.txt As [~te...@apache.org] pointed out in HBASE-13103, we need to discuss how to normalize regions of tables under namespace control. What was proposed is to disable normalization of such tables. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13936) Improve configuration framework
[ https://issues.apache.org/jira/browse/HBASE-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14604404#comment-14604404 ] Apekshit Sharma commented on HBASE-13936: - I think that Hadoop style site files is an good design and it should be left as such. The scope of this project and the design changes we have thought so far (in the doc) will be invisible to the users and will only impact dev. bq. Moving from Configuration to ConfigurationManager So the basic idea is to encapsulate Configuration within ConfigurationManager and provide a better API for handling configurations. That will help in building a better framework for dynamic configurations, type check configuration values, and get rid of few other bad patterns. Since the aim here is to promote right patterns (and possibly design the framework so that it's not possible to go otherwise), I will highlight major issues here and get everyone's opinions. [~apurtell] On that note, what do you about the issue of set*() functions (my last post). Improve configuration framework --- Key: HBASE-13936 URL: https://issues.apache.org/jira/browse/HBASE-13936 Project: HBase Issue Type: Umbrella Reporter: Apekshit Sharma Attachments: DynamicConfigs.v01.docx, design.png Here's the design doc: https://docs.google.com/document/d/1WiO2bqguR2DaVT-J2SZTCONbQ3pEhpbOI_bbLMaXRjE/edit# Main changes: get*(foo.bar, default_value) --- get*(HConfig.FOO_BAR) // using enums Robust framework and better documentation for dynamic configurations. Basic overview of new design: !design.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)