I store over 500M documents in HBase, and index using Solr with dynamic fields. This gives you tremendous flexibility to do the type of queries you are looking for -- and to make them simple and intuitive via a faceted interface.
However, there was quite a bit of software that we had to write to get things going, and I can neither release all of it open source, or support other people using it. If I had to start again, I would seriously look at solutions like elastic search and lily. Dave -----Original Message----- From: Mark Kerzner [mailto:[email protected]] Sent: Friday, June 03, 2011 5:57 PM To: HBase Discussion Group Subject: What's the best approach to search in HBase? Hi, I need to store, say, 10M-100M documents, with each document having say 100 fields, like author, creation date, access date, etc., and then I want to ask questions like give me all documents whose author is like abc**, and creation date any time in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions, matching a list of some keywords. What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan and compare of every record? Thanks a bunch! Mark
