Thank you, everybody. I summarized your advice here, http://shmsoft.blogspot.com/2011/06/search-in-ediscovery.html, because I need it for my open source eDiscovery, and now just need to try it all :)
Sincerely, Mark On Mon, Jun 6, 2011 at 11:18 AM, Buttler, David <[email protected]> wrote: > I store over 500M documents in HBase, and index using Solr with dynamic > fields. This gives you tremendous flexibility to do the type of queries you > are looking for -- and to make them simple and intuitive via a faceted > interface. > > However, there was quite a bit of software that we had to write to get > things going, and I can neither release all of it open source, or support > other people using it. If I had to start again, I would seriously look at > solutions like elastic search and lily. > > Dave > > -----Original Message----- > From: Mark Kerzner [mailto:[email protected]] > Sent: Friday, June 03, 2011 5:57 PM > To: HBase Discussion Group > Subject: What's the best approach to search in HBase? > > Hi, > > I need to store, say, 10M-100M documents, with each document having say 100 > fields, like author, creation date, access date, etc., and then I want to > ask questions like > > give me all documents whose author is like abc**, and creation date any > time > in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions, > matching a list of some keywords. > > What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan > and compare of every record? > > Thanks a bunch! > > Mark >
