Re: Using external indexes in an HBase Map/Reduce job...

Steven Noels Tue, 12 Oct 2010 10:22:15 -0700

Did you have a look at Lily? A billion items will be interesting, but we
offer M/R index rebuild (against SOLR) and incremental updates as well. You
could take a look at the RowLog library we did to do this in a robust way -
which has no Lily dependencies.


www.lilyproject.org

Cheers,

Steven.

On Tue, Oct 12, 2010 at 2:36 PM, Michael Segel <[email protected]>wrote:

>
> Hi,
>
> Now I realize that most everyone is sitting in NY, while some of us can't
> leave our respective cities....
>
> Came across this problem and I was wondering how others solved it.
>
> Suppose you have a really large table with 1 billion rows of data.
> Since HBase really doesn't have any indexes built in (Don't get me started
> about the contrib/transactional stuff...), you're forced to use some sort of
> external index, or roll your own index table.
>
> The net result is that you end up with a list object that contains your
> result set.
>
> So the question is... what's the best way to feed the list object in?
>
> One option I thought about is writing the object to a file and then using
> it as the file in and then control the splitters. Not the most efficient but
> it would work.
>
> Was trying to find a more 'elegant' solution and I'm sure that anyone using
> SOLR or LUCENE or whatever... had come across this problem too.
>
> Any suggestions?
>
> Thx
>
>




-- 
Steven Noels
http://outerthought.org/
Open Source Content Applications
Makers of Kauri, Daisy CMS and Lily

Re: Using external indexes in an HBase Map/Reduce job...

Reply via email to