Hi, I'm wondering how your design of the key for your index looks like?!
My own inital implementation for an inverted index is to create for each distinct column a separate table. Example (key - Col : Col : Col) A - Id1 : Id2 : Id3 B - Id4 : Id5 : Id6 C - Id7 : Id8 : Id9 This is convenient, however, this can lead to problems as the number of columns (holding the referenced key) can grow extremely large. As far as I understand the contributed Indexer in HBase it maintains the Indexes in this way (only had a quick look on it): AId1 AId2 AId3 BId4 BId5 BId6 CId7 CId8 CId9 I would really like to hear some opinions on this?! /SJ which then contains the value as the key and the related columns contain the keys to the occurences. The drawback is that the colu On Fri, Jul 23, 2010 at 1:44 PM, Luke Forehand <[email protected]> wrote: > Jamie Cockrill <jamie.cockr...@...> writes: > >> >> Luke, >> >> Apologies no, I've been rather sidelined by another issue at the >> moment. It's always the same, you get to playing with something >> interesting and you get pulled off to fight fires somewhere else. Once >> I get back on the case I'll have a look, however it someone did >> previously mention another library built by the guys building Lily >> that seems to aim to achieve the same goal. Reposted again here: >> >> http://lilycms.org/lily/roadmap/sketchbook/hbaseindexes.html >> >> I've not had time to look at it in detail, but it might be a good >> starting point to get something up and going quickly if that's what >> you're after. >> >> Ta, >> >> Jamie >> >> On 23 July 2010 16:58, Luke Forehand >> <luke.foreh...@...> wrote: >> > Jamie Cockrill <jamie.cockr...@...> writes: >> > > > Jamie, > > Thanks for the lead, I'm taking a look at the hbaseindex src now. I'm now > leaning toward writing and maintaining my own secondary index rather than use > the contrib IndexedTable stuff. With IndexedTable I don't have enough control > over the index row key construction, and that is important depending on how I > want to scan/filter the indexed table. Also, writing the index table from an > existing huge table with IndexedTableAdmin takes too long and would be better > suited as a Map Reduce job. These are just a few observations I've made > after a > somewhat cursory glance at the code. > > -Luke > >
