I think that's one of the questions we're all starting to face.

At my current client, using a secondary index that is embedded in to the 
underlying table structure has extreme value. It really helps when you want to 
set up a m/r scanner() filtering against something that is not the row key or 
is a sparse value. An example... my row key is a unique record value. But in 
batch processing I want to filter on a specific job_id. So if I create a 
secondary index on job_id, I'll have fewer rows to process when compared to a 
full scan w a filter on job_id.

At the same time, if one wants to allow ad hoc queries against arbitrary data 
... you'll want a different type of index. 

So the value depends on how you want to use Hadoop/HBase.

The problem is that you have to pull the code off git hub and build it against 
your version of hbase. (0.89 or 0.20.5)

> Date: Wed, 28 Jul 2010 17:05:46 -0400
> Subject: Extending RegionServer for Indexing or using the Client?
> From: [email protected]
> To: [email protected]
> 
> Hi,
> 
> I'm currently looking intensively into indexing for HBase. The Indexer
> maintained on http://github.com/hbase-trx/hbase-transactional-tableindexed
> extends the RegionServer and thus the client just defines the Index
> and then adds one Put with the record towards HBase. The rest is taken
> care on the Region side by the derived class.
> 
> What do you guys say - does it pay out to implement the Indexing (and
> maybe some other opperations that result in a put) on the Region side
> or rather create the Indexer "outside" HBase and then push for
> instance two Puts() towards HBase? I saw that Lily is doing the
> Client-Side way.
> 
> 
> Thx for the great support!
> 
> /SJ
                                          

Reply via email to