[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-09 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621502#comment-14621502
 ] 

Ben Lau commented on HBASE-13991:
-

Hi stack, we did consider doing an online migration from the old format to the 
new one, but it would require messy changes to the codebase and be tricky to 
test fully. That's because whenever you access a region's contents you would 
have to test for both the humongous and non-humongous path contents instead of 
just using what you know it to be. Also there's a lot more going on during an 
online migration, regions can be moving, splitting, recovering from normal 
cluster operation and testing that an online migration works in all cases would 
be tricky. This can be ameliorated to some extent by making the migration 
'mostly online', i.e. offlining regions, migrating them, then re-opening them.  
For Yahoo’s use case, an online migration is not necessary but if the community 
really needs it we could look into it.  

[~toffer] can comment more, but I believe we would prefer to insert the buckets 
under the table directory for now and gradually transition later to reworking 
meta to be the source of table/region association information, creating a 
uniform/non-table oriented data directory, etc.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619553#comment-14619553
 ] 

stack commented on HBASE-13991:
---

bq. What do you guys think about this approach?

We'd have to keep up two ways of accessing files (One of the lads used to try 
hard to keep all filesystem access encapsulated inside a class but was 
frustrated because not all of us played along... )

Tell us more how you think it would work [~benlau]?  You still want to insert 
that tier of 'buckets' under a table as per your attached doc?

Have you considered being able to do an online migration from the old format to 
the new?

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-06 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615813#comment-14615813
 ] 

Ben Lau commented on HBASE-13991:
-

Hi guys, hope you had a happy 4th of July.  

We would like to do something akin to Lars’ last idea.  That is, we will have 
code to support both the old layout and the new layout, but it will be on a per 
HBase cluster basis.  You will be able to migrate a cluster entirely to the 
hierarchical layout or leave it on the old layout.  

This approach has the following pros:
- If HBase users do not need/want the new layout, they will not have to do an 
offline upgrade in order to use new HBase code.  The alternative is to make an 
online upgrade for the hierarchical layout, but this would require some very 
messy changes to the codebase and also be tricky to test fully.
- HBase code will not have to ‘detect’ whether tables/paths/regions are 
hierarchical or not.  The master or region server can simply look at the root 
table at startup and use that to determine if the cluster has migrated to the 
hierarchical layout.  This single source of truth would make code less ugly 
since you don’t need to do in-context per-region/path checks in different parts 
of the codebase. 

What do you guys think about this approach?  

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-07-03 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14613446#comment-14613446
 ] 

Andrew Purtell commented on HBASE-13991:


bq.   I think we agree that it may make sense to switchover entirely to the new 
layout instead of making it optional.

+1

Agree there's broad agreement on that. 

Looks like also we should be able to split the need to haves from the nice to 
haves to scope work into increments and make it possible for some changes to go 
further back than master. 

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609389#comment-14609389
 ] 

Lars Hofhansl commented on HBASE-13991:
---

Let's solve the problem at hand first :)

Can we simply use the new layout, always? That means everything (including 
snapshots) needs to be supported.
Code would have to be able to detect old and new layout and support both, 
possibly forever (or we use an implicit table flag only set on new tables).


 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-30 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609514#comment-14609514
 ] 

Ben Lau commented on HBASE-13991:
-

Hi guys, thanks for all the feedback.  I think we agree that it may make sense 
to switchover entirely to the new layout instead of making it optional.  Let me 
get back to you guys, I need to talk with Francis some more about the other 
suggestions.  Thanks guys.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-30 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609376#comment-14609376
 ] 

Enis Soztutar commented on HBASE-13991:
---

I don't like introducing a concept of humongous tables. I can definitely 
understand that you wanted to get the layout changes without affecting other 
users, but adding another switch is not the way to go for operational concerns. 
If we are doing a layout change, we should do it for all the tables, and find a 
way to handle migrations automatically, etc. 

What other areas that a humongous table differ from a regular table? What 
happens if a table starts as a regular table, then becomes humongous? 

I like where this is going. Sorry, [~benlau] this might again turn into a 
bigger re-architect some parts kind of jira instead of a simpler patch. But I 
agreed with Stack and Matteo that we should take a holistic view. 

See: 
https://issues.apache.org/jira/browse/HBASE-7806?focusedCommentId=13594284page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13594284

HBASE-13159 also talks about unifying reference files / file links and snapshot 
files so that the soft links is much easier to maintain.  


 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-30 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14608540#comment-14608540
 ] 

Matteo Bertozzi commented on HBASE-13991:
-

instead of doing an incompatible change to workaround just this problem, 
we should look into what else can we solve by changing the fs layout.

some of the point of my list are:
 * avoid moving files around, tmp - table region - archive
 **  Avoid the hack “if file is not here, try there” of HFileLink
 * avoid rename() calls to simulate transactions (e.g. compaction, split, 
creation, deletion, ...)
 ** rename calls in some environment (e.g. s3) are full copies instead of just 
a metadata operation
 *  File sharing between different table without links “Clone Table”
 ** Simplify snapshot/restore reference code and avoid all the calls to 
fs.listStatus(), fs.createNew()
 ** avoid write permission required in MR over snapshots (for backlinks 
creation)

we should have a single /data dir where we place data, and then each table will 
point to that.
you'll avoid moving the file around (for tmp-creation/commit and archiving) and 
your data is not tight together with a table, allowing things like snapshots, 
clones and read-replicas to work without hack. and you'll also gain some future 
ability to do some kind of deduplication and better compaction logic.

if you look at the last slide of: 
https://issues.apache.org/jira/secure/attachment/12568749/HBASE-7806.pdf
there was a proposed layout, where you have this kind of separation.
you can store the list of files in meta as Stack mentioned, or you can have 
some manifest file containing the current state of the table (something like 
the SnapshotManifest 
https://github.com/apache/hbase/blob/master/hbase-protocol/src/main/protobuf/Snapshot.proto#L41).
 the point is, do not tight together the data with the logical placement of  
table/regions and have an atomic operation for when you add/remove files. think 
about features like snapshot and replicas where the files are not owned only by 
one region.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606793#comment-14606793
 ] 

Ben Lau commented on HBASE-13991:
-

Sure.  Created a reviewboard request here: https://reviews.apache.org/r/36029/

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606381#comment-14606381
 ] 

Ted Yu commented on HBASE-13991:


Interesting initiative.

bq. and 2x 500 GB hard drives

Please clarify: the master machine has more memory compared to region server 
machines. I expected the machine to have bigger disk, not smaller one.

In the patch, I don't see AssignmentManager being modified.
So this change is limited to filesystem layout mostly ?

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606521#comment-14606521
 ] 

Ben Lau commented on HBASE-13991:
-

The HBase master doesn't write much to disk compared to region servers/data 
nodes.  Yes, this change is mostly to the filesystem layout.  We may use the 
'humongous' flag for other things in the future but currently it is only used 
to determine the layout.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606802#comment-14606802
 ] 

Hadoop QA commented on HBASE-13991:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12742556/HBASE-13991-master.patch
  against master branch at commit f8bd578b80b4e656d799c82ca1b6191e35bb0ae4.
  ATTACHMENT ID: 12742556

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 94 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14610//console

This message is automatically generated.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606783#comment-14606783
 ] 

Ted Yu commented on HBASE-13991:


Can you put the patch on reviewboard ?

Thanks

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606824#comment-14606824
 ] 

Ted Yu commented on HBASE-13991:


Would there be a tool that converts current region layout to the hierarchical 
one ?

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14607544#comment-14607544
 ] 

stack commented on HBASE-13991:
---

[~benlau] Thank you for writing up design and for the helpful experiments. 

Simply put, HDFS was not designed to handle the existence of millions of 
files/directories in a single directory. 

Yes.

How about just making a four level bucket of all HFiles. From  
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf,
 Each tablet’s SSTables are registered in the METADATA table. (Our 
[~mbertozzi] has been going on about going this route for ever -- see his 
design attached to https://issues.apache.org/jira/browse/HBASE-7806... It has 
some in common w/ yours).

Thanks [~benlau]



 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  
 Latest version of the patch is available on the review board: 
 https://reviews.apache.org/r/36029/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13991) Hierarchical Layout for Humongous Tables

2015-06-29 Thread Ben Lau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606940#comment-14606940
 ] 

Ben Lau commented on HBASE-13991:
-

I don't think we had one planned but if there were enough parties interested in 
a tool we could probably make one.  It would probably not be too hard (unless 
I'm missing something) for a table that can be taken offline for a short period 
of time.  

On a side note there were some new conflicts in master since I created the 
patch so I have fixed them and re-uploaded a new patch in the reviewboard.  
I'll treat the reviewboard as the holder of the current version of the patch 
and leave the attachment in this ticket as the 1st draft submission.

 Hierarchical Layout for Humongous Tables
 

 Key: HBASE-13991
 URL: https://issues.apache.org/jira/browse/HBASE-13991
 Project: HBase
  Issue Type: Sub-task
Reporter: Ben Lau
Assignee: Ben Lau
 Attachments: HBASE-13991-master.patch, HumongousTableDoc.pdf


 Add support for humongous tables via a hierarchical layout for regions on 
 filesystem.  
 Credit for most of this code goes to Huaiyu Zhu.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)