[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621502#comment-14621502 ] Ben Lau commented on HBASE-13991: - Hi stack, we did consider doing an online migration from the old format to the new one, but it would require messy changes to the codebase and be tricky to test fully. That's because whenever you access a region's contents you would have to test for both the humongous and non-humongous path contents instead of just using what you know it to be. Also there's a lot more going on during an online migration, regions can be moving, splitting, recovering from normal cluster operation and testing that an online migration works in all cases would be tricky. This can be ameliorated to some extent by making the migration 'mostly online', i.e. offlining regions, migrating them, then re-opening them. For Yahoo’s use case, an online migration is not necessary but if the community really needs it we could look into it. [~toffer] can comment more, but I believe we would prefer to insert the buckets under the table directory for now and gradually transition later to reworking meta to be the source of table/region association information, creating a uniform/non-table oriented data directory, etc. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619553#comment-14619553 ] stack commented on HBASE-13991: --- bq. What do you guys think about this approach? We'd have to keep up two ways of accessing files (One of the lads used to try hard to keep all filesystem access encapsulated inside a class but was frustrated because not all of us played along... ) Tell us more how you think it would work [~benlau]? You still want to insert that tier of 'buckets' under a table as per your attached doc? Have you considered being able to do an online migration from the old format to the new? Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615813#comment-14615813 ] Ben Lau commented on HBASE-13991: - Hi guys, hope you had a happy 4th of July. We would like to do something akin to Lars’ last idea. That is, we will have code to support both the old layout and the new layout, but it will be on a per HBase cluster basis. You will be able to migrate a cluster entirely to the hierarchical layout or leave it on the old layout. This approach has the following pros: - If HBase users do not need/want the new layout, they will not have to do an offline upgrade in order to use new HBase code. The alternative is to make an online upgrade for the hierarchical layout, but this would require some very messy changes to the codebase and also be tricky to test fully. - HBase code will not have to ‘detect’ whether tables/paths/regions are hierarchical or not. The master or region server can simply look at the root table at startup and use that to determine if the cluster has migrated to the hierarchical layout. This single source of truth would make code less ugly since you don’t need to do in-context per-region/path checks in different parts of the codebase. What do you guys think about this approach? Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613446#comment-14613446 ] Andrew Purtell commented on HBASE-13991: bq. I think we agree that it may make sense to switchover entirely to the new layout instead of making it optional. +1 Agree there's broad agreement on that. Looks like also we should be able to split the need to haves from the nice to haves to scope work into increments and make it possible for some changes to go further back than master. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609389#comment-14609389 ] Lars Hofhansl commented on HBASE-13991: --- Let's solve the problem at hand first :) Can we simply use the new layout, always? That means everything (including snapshots) needs to be supported. Code would have to be able to detect old and new layout and support both, possibly forever (or we use an implicit table flag only set on new tables). Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609514#comment-14609514 ] Ben Lau commented on HBASE-13991: - Hi guys, thanks for all the feedback. I think we agree that it may make sense to switchover entirely to the new layout instead of making it optional. Let me get back to you guys, I need to talk with Francis some more about the other suggestions. Thanks guys. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609376#comment-14609376 ] Enis Soztutar commented on HBASE-13991: --- I don't like introducing a concept of humongous tables. I can definitely understand that you wanted to get the layout changes without affecting other users, but adding another switch is not the way to go for operational concerns. If we are doing a layout change, we should do it for all the tables, and find a way to handle migrations automatically, etc. What other areas that a humongous table differ from a regular table? What happens if a table starts as a regular table, then becomes humongous? I like where this is going. Sorry, [~benlau] this might again turn into a bigger re-architect some parts kind of jira instead of a simpler patch. But I agreed with Stack and Matteo that we should take a holistic view. See: https://issues.apache.org/jira/browse/HBASE-7806?focusedCommentId=13594284page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13594284 HBASE-13159 also talks about unifying reference files / file links and snapshot files so that the soft links is much easier to maintain. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608540#comment-14608540 ] Matteo Bertozzi commented on HBASE-13991: - instead of doing an incompatible change to workaround just this problem, we should look into what else can we solve by changing the fs layout. some of the point of my list are: * avoid moving files around, tmp - table region - archive ** Avoid the hack “if file is not here, try there” of HFileLink * avoid rename() calls to simulate transactions (e.g. compaction, split, creation, deletion, ...) ** rename calls in some environment (e.g. s3) are full copies instead of just a metadata operation * File sharing between different table without links “Clone Table” ** Simplify snapshot/restore reference code and avoid all the calls to fs.listStatus(), fs.createNew() ** avoid write permission required in MR over snapshots (for backlinks creation) we should have a single /data dir where we place data, and then each table will point to that. you'll avoid moving the file around (for tmp-creation/commit and archiving) and your data is not tight together with a table, allowing things like snapshots, clones and read-replicas to work without hack. and you'll also gain some future ability to do some kind of deduplication and better compaction logic. if you look at the last slide of: https://issues.apache.org/jira/secure/attachment/12568749/HBASE-7806.pdf there was a proposed layout, where you have this kind of separation. you can store the list of files in meta as Stack mentioned, or you can have some manifest file containing the current state of the table (something like the SnapshotManifest https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Snapshot.proto#L41). the point is, do not tight together the data with the logical placement of table/regions and have an atomic operation for when you add/remove files. think about features like snapshot and replicas where the files are not owned only by one region. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606793#comment-14606793 ] Ben Lau commented on HBASE-13991: - Sure. Created a reviewboard request here: https://reviews.apache.org/r/36029/ Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606381#comment-14606381 ] Ted Yu commented on HBASE-13991: Interesting initiative. bq. and 2x 500 GB hard drives Please clarify: the master machine has more memory compared to region server machines. I expected the machine to have bigger disk, not smaller one. In the patch, I don't see AssignmentManager being modified. So this change is limited to filesystem layout mostly ? Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606521#comment-14606521 ] Ben Lau commented on HBASE-13991: - The HBase master doesn't write much to disk compared to region servers/data nodes. Yes, this change is mostly to the filesystem layout. We may use the 'humongous' flag for other things in the future but currently it is only used to determine the layout. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606802#comment-14606802 ] Hadoop QA commented on HBASE-13991: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12742556/HBASE-13991-master.patch against master branch at commit f8bd578b80b4e656d799c82ca1b6191e35bb0ae4. ATTACHMENT ID: 12742556 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 94 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14610//console This message is automatically generated. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606783#comment-14606783 ] Ted Yu commented on HBASE-13991: Can you put the patch on reviewboard ? Thanks Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606824#comment-14606824 ] Ted Yu commented on HBASE-13991: Would there be a tool that converts current region layout to the hierarchical one ? Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607544#comment-14607544 ] stack commented on HBASE-13991: --- [~benlau] Thank you for writing up design and for the helpful experiments. Simply put, HDFS was not designed to handle the existence of millions of files/directories in a single directory. Yes. How about just making a four level bucket of all HFiles. From http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf, Each tablet’s SSTables are registered in the METADATA table. (Our [~mbertozzi] has been going on about going this route for ever -- see his design attached to https://issues.apache.org/jira/browse/HBASE-7806... It has some in common w/ yours). Thanks [~benlau] Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. Latest version of the patch is available on the review board: https://reviews.apache.org/r/36029/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables
[ https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606940#comment-14606940 ] Ben Lau commented on HBASE-13991: - I don't think we had one planned but if there were enough parties interested in a tool we could probably make one. It would probably not be too hard (unless I'm missing something) for a table that can be taken offline for a short period of time. On a side note there were some new conflicts in master since I created the patch so I have fixed them and re-uploaded a new patch in the reviewboard. I'll treat the reviewboard as the holder of the current version of the patch and leave the attachment in this ticket as the 1st draft submission. Hierarchical Layout for Humongous Tables Key: HBASE-13991 URL: https://issues.apache.org/jira/browse/HBASE-13991 Project: HBase Issue Type: Sub-task Reporter: Ben Lau Assignee: Ben Lau Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf Add support for humongous tables via a hierarchical layout for regions on filesystem. Credit for most of this code goes to Huaiyu Zhu. -- This message was sent by Atlassian JIRA (v6.3.4#6332)