RE: Using external indexes in an HBase Map/Reduce job...

Michael Segel Tue, 12 Oct 2010 09:14:05 -0700

Thanks for the reply...

That's not exactly what I'm looking for...


Suppose you have an exterior system which provides you the list of row keys you 
want. 
What ever that system is.

So you have a java list object and you want to do a M/R based on input from a 
Java List.

What's the best way to do it?


> From: [email protected]
> Date: Tue, 12 Oct 2010 16:54:00 +0400
> Subject: Re: Using external indexes in an HBase Map/Reduce job...
> To: [email protected]
> 
> Hi Michael Segel.
> 
> If I understand your question correctrly, you looking for optimal way
> for scanning
> index search results? If not, my answer below is not relevant :).
> 
> 1. For mr joins or large index results scan bloom filters can be used
> like described here
> http://blog.rapleaf.com/dev/2009/09/25/batch-querying-with-cascading/
> 
> 2. Another option: denormalize data in same or separate table.
> (depends on nature of object relations).
> 
> 3. Random gets. For each row from solr issue random get. (for really
> small result sets or paging).
> 
> 4. Put compacted data (latest data, small subset of data etc) into solr index.
> 
> 
> 2010/10/12 Michael Segel <[email protected]>:
> >
> > Hi,
> >
> > Now I realize that most everyone is sitting in NY, while some of us can't 
> > leave our respective cities....
> >
> > Came across this problem and I was wondering how others solved it.
> >
> > Suppose you have a really large table with 1 billion rows of data.
> > Since HBase really doesn't have any indexes built in (Don't get me started 
> > about the contrib/transactional stuff...), you're forced to use some sort 
> > of external index, or roll your own index table.
> >
> > The net result is that you end up with a list object that contains your 
> > result set.
> >
> > So the question is... what's the best way to feed the list object in?
> >
> > One option I thought about is writing the object to a file and then using 
> > it as the file in and then control the splitters. Not the most efficient 
> > but it would work.
> >
> > Was trying to find a more 'elegant' solution and I'm sure that anyone using 
> > SOLR or LUCENE or whatever... had come across this problem too.
> >
> > Any suggestions?
> >
> > Thx
> >
> >

RE: Using external indexes in an HBase Map/Reduce job...

Reply via email to