You did not mention what version of HBase you are on. In 0.94/trunk, there is a RegionSplitPolicy feature that may work in your case ... https://issues.apache.org/jira/browse/HBASE-5304 http://search-hadoop.com/jd/hbase/org/apache/hadoop/hbase/regionserver/RegionSplitPolicy.html
I came across this implementation which may be what you want http://search-hadoop.com/jd/hbase/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.html --Suraj On Wed, Apr 4, 2012 at 7:52 AM, a <[email protected]> wrote: > Hello, > > Suppose that I have "tall-narrow" HBase table with composite key e.g. > {class_id}#{student_id}. > > The exemplary data will look like as follow: > > ROW_KEY | ONE COLLUMN FAMILY > ---------------------------------------------------------------- > 1 | name = "Object Oriented Programming" > | location = "Building A" > | semester = "Winter" > | // many other information about class > ---------------------------------------------------------------- > 1_1 | name = "Alice White" > 1_2 | name = "Betty Lipcon" > // many other records related to class with ID = 1 > ---------------------------------------------------------------- > // many other records related to class with ID = 2, 3, 4, .. N > > > I would like to use this HBase table as input source for my MapReduce job, > where > the mapper will emit <key, value> pairs where: > key = ${class_id}#${student_id}, > value = some information about corresponding class. > > Thanks to lexicographically sorting of row keys, it would be easily to > implement > if I could split HBase table into regions where all colocated rows (with the > same row prefix i.e. {class_id}) will reside in the same region. Then for each > group of such collocated records, I could use its first row to get information > about class and emit this information with rowkey from each remaining row. > > So I would like to ask, if such a custom split is easy to implement? > > I know that: > 1) I could model it with "flat-wide" table and I will have everything what I > need in separate rows, > 2) use two MR jobs for that. > > but I am interested in best solution for "tall-narrow" table with one MR job. > > Many thanks in advance for any hints! > > > > > > > > > > > >
