A small map-reduce program could do updates to Hbase or if your incremental
data is relatively small, you can do the update one by one.  This can work
fine, but it doesn't really solve the top-100 term problem.  For that, it
may be nice to have an occasional MR program that over-produces a list of
top items.  Then each update can to against your top item list which will be
approximately the right list for the time until the next MR update.

On Wed, Feb 2, 2011 at 11:04 PM, Pete Haidinyak <javam...@cox.net> wrote:

> I will be updating the keywords and their frequency every X minutes so I
> don't believe M/R would work well but I could be wrong. I've been doing this
> approach with other data and have been receiving sub 1 second on my queries
> with two overtaxed servers. I figured this problem has been solved many
> times before and I was just looking for guidance.
>
> Thanks
>
> -Pete
>
>
> On Wed, 02 Feb 2011 16:44:39 -0800, Jean-Daniel Cryans <
> jdcry...@apache.org> wrote:
>
>  I don't think HBase is really needed here, unless you somehow need
>> random read/write to those search queries.
>>
>> J-D
>>
>> On Wed, Feb 2, 2011 at 1:27 PM, Peter Haidinyak <phaidin...@local.com>
>> wrote:
>>
>>> Hi all,
>>>       I was just tasked to take the keywords used for a search and put
>>> them in HBase so we can slice and dice them. They are interested in standard
>>> stuff like highest frequency word, word pairs, etc.
>>>       I know I'm not the first to do this so does anyone have a
>>> recommendation on how to setup a schema for this sort of task?
>>>
>>> Thanks
>>>
>>> -Pete
>>>
>>>
>

Reply via email to