Since a Get becomes a Scan in the region servers, the major difference between the two should be the time it takes to open/close the Scan.
My concern would be the number of different values you would have to index. If the number is low and you are using a wide table, it means you'd end up with very fat rows that occupy big region chunks (I guess you also have to decide if you need to return all those rows). You should definitely benchmark both solutions. J-D On Wed, Jun 15, 2011 at 11:48 AM, Ashwin Pejavar <[email protected]> wrote: > I need to index my main hbase table on some column values. The available > indexing solutions like Lily are a little too heavyweight for my simple > requirements and so I decided to roll my own. > > Based on my reading, there seem to be two main options: > > 1) For every column value that needs to be indexed on the main table, add > index table records where the rowkey is of the following form: > <Optional prefix><column-name><column-value><main-table-rowkey> > > The rowkey is added to the index table record to support non-unique indexes > and also to avoid a get to check for existence, before the put. > > The index is accessed by creating a scan where the startRow is initialized to > <Optional prefix><column-name><column-value> and setting a > BinaryPrefixComparator RowFilter for the same rowk-key prefix to stop the > scan. For every record returned by the scan, get the original table rowKey > and do a get. > > I have glossed over some details like ensuring that <Optional > prefix><column-name> is of a fixed size when the table supports indexes for > multiple columns. > > 2) Use a wide table approach where the index record rowkey is of the form: > <Optional prefix><column-name><column-value> and the main-table-rowkey is > added as columns e.g. "col-family:<main-table-rowkey>" > > The index is accessed through a simple get with the index rowkey <Optional > prefix><column-name><column-value>. > > My question is, is one of these approaches preferable to the other from a > performance perspective? Will a get significantly outperform a scan with a > startRow and a BinaryPrefixComparator RowFilter or are the two forms > equivalent? > > Thanks, > - Ashwin >
