Hi, Does it have any special advantage to implement on the region-side in a sense that communication overhead can be lowered or for instance some performance improvements can be gained?
I finally made the github version of the indexer running with the current trunk of HBase (except the transactions), but that would mean that every time HBase is changing the APIs & Co. I need to align my indexer. I see that as big problem if I don't want to get involved too deeply into HBase development. After adjusting the Indexer code I now also understand why this has been moved to the contrib directory. I would appreciate any input on this topic as I need to make the decision either implementing the indexing against the client or as a derived feature ... /SJ On Wed, Jul 28, 2010 at 6:07 PM, Michael Segel <[email protected]> wrote: > > I think that's one of the questions we're all starting to face. > > At my current client, using a secondary index that is embedded in to the > underlying table structure has extreme value. It really helps when you want > to set up a m/r scanner() filtering against something that is not the row key > or is a sparse value. An example... my row key is a unique record value. But > in batch processing I want to filter on a specific job_id. So if I create a > secondary index on job_id, I'll have fewer rows to process when compared to a > full scan w a filter on job_id. > > At the same time, if one wants to allow ad hoc queries against arbitrary data > ... you'll want a different type of index. > > So the value depends on how you want to use Hadoop/HBase. > > The problem is that you have to pull the code off git hub and build it > against your version of hbase. (0.89 or 0.20.5) > >> Date: Wed, 28 Jul 2010 17:05:46 -0400 >> Subject: Extending RegionServer for Indexing or using the Client? >> From: [email protected] >> To: [email protected] >> >> Hi, >> >> I'm currently looking intensively into indexing for HBase. The Indexer >> maintained on http://github.com/hbase-trx/hbase-transactional-tableindexed >> extends the RegionServer and thus the client just defines the Index >> and then adds one Put with the record towards HBase. The rest is taken >> care on the Region side by the derived class. >> >> What do you guys say - does it pay out to implement the Indexing (and >> maybe some other opperations that result in a put) on the Region side >> or rather create the Indexer "outside" HBase and then push for >> instance two Puts() towards HBase? I saw that Lily is doing the >> Client-Side way. >> >> >> Thx for the great support! >> >> /SJ >
