I store over 500M documents in HBase, and index using Solr with dynamic fields. 
 This gives you tremendous flexibility to do the type of queries you are 
looking for -- and to make them simple and intuitive via a faceted interface.

However, there was quite a bit of software that we had to write to get things 
going, and I can neither release all of it open source, or support other people 
using it.  If I had to start again, I would seriously look at solutions like 
elastic search and lily.

Dave

-----Original Message-----
From: Mark Kerzner [mailto:[email protected]] 
Sent: Friday, June 03, 2011 5:57 PM
To: HBase Discussion Group
Subject: What's the best approach to search in HBase?

Hi,

I need to store, say, 10M-100M documents, with each document having say 100
fields, like author, creation date, access date, etc., and then I want to
ask questions like

give me all documents whose author is like abc**, and creation date any time
in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions,
matching a list of some keywords.

What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan
and compare of every record?

Thanks a bunch!

Mark

Reply via email to