Hi,

Does it have any special advantage to implement on the region-side in
a sense that communication overhead can be lowered or for instance
some performance improvements can be gained?

I finally made the github version of the indexer running with the
current trunk of HBase (except the transactions), but that would mean
that every time HBase is changing the APIs & Co. I need to align my
indexer. I see that as big problem if I don't want to get involved too
deeply into HBase development.

After adjusting the Indexer code I now also understand why this has
been moved to the contrib directory.

I would appreciate any input on this topic as I need to make the
decision either implementing the indexing against the client or as a
derived feature ...

/SJ

On Wed, Jul 28, 2010 at 6:07 PM, Michael Segel
<[email protected]> wrote:
>
> I think that's one of the questions we're all starting to face.
>
> At my current client, using a secondary index that is embedded in to the 
> underlying table structure has extreme value. It really helps when you want 
> to set up a m/r scanner() filtering against something that is not the row key 
> or is a sparse value. An example... my row key is a unique record value. But 
> in batch processing I want to filter on a specific job_id. So if I create a 
> secondary index on job_id, I'll have fewer rows to process when compared to a 
> full scan w a filter on job_id.
>
> At the same time, if one wants to allow ad hoc queries against arbitrary data 
> ... you'll want a different type of index.
>
> So the value depends on how you want to use Hadoop/HBase.
>
> The problem is that you have to pull the code off git hub and build it 
> against your version of hbase. (0.89 or 0.20.5)
>
>> Date: Wed, 28 Jul 2010 17:05:46 -0400
>> Subject: Extending RegionServer for Indexing or using the Client?
>> From: [email protected]
>> To: [email protected]
>>
>> Hi,
>>
>> I'm currently looking intensively into indexing for HBase. The Indexer
>> maintained on http://github.com/hbase-trx/hbase-transactional-tableindexed
>> extends the RegionServer and thus the client just defines the Index
>> and then adds one Put with the record towards HBase. The rest is taken
>> care on the Region side by the derived class.
>>
>> What do you guys say - does it pay out to implement the Indexing (and
>> maybe some other opperations that result in a put) on the Region side
>> or rather create the Indexer "outside" HBase and then push for
>> instance two Puts() towards HBase? I saw that Lily is doing the
>> Client-Side way.
>>
>>
>> Thx for the great support!
>>
>> /SJ
>

Reply via email to