Did you have a look at Lily? A billion items will be interesting, but we offer M/R index rebuild (against SOLR) and incremental updates as well. You could take a look at the RowLog library we did to do this in a robust way - which has no Lily dependencies.
www.lilyproject.org Cheers, Steven. On Tue, Oct 12, 2010 at 2:36 PM, Michael Segel <[email protected]>wrote: > > Hi, > > Now I realize that most everyone is sitting in NY, while some of us can't > leave our respective cities.... > > Came across this problem and I was wondering how others solved it. > > Suppose you have a really large table with 1 billion rows of data. > Since HBase really doesn't have any indexes built in (Don't get me started > about the contrib/transactional stuff...), you're forced to use some sort of > external index, or roll your own index table. > > The net result is that you end up with a list object that contains your > result set. > > So the question is... what's the best way to feed the list object in? > > One option I thought about is writing the object to a file and then using > it as the file in and then control the splitters. Not the most efficient but > it would work. > > Was trying to find a more 'elegant' solution and I'm sure that anyone using > SOLR or LUCENE or whatever... had come across this problem too. > > Any suggestions? > > Thx > > -- Steven Noels http://outerthought.org/ Open Source Content Applications Makers of Kauri, Daisy CMS and Lily
