On Sat, Jun 4, 2011 at 2:57 AM, Mark Kerzner <[email protected]> wrote:
> Hi, > > I need to store, say, 10M-100M documents, with each document having say 100 > fields, like author, creation date, access date, etc., and then I want to > ask questions like > > give me all documents whose author is like abc**, and creation date any > time > in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions, > matching a list of some keywords. > > What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan > and compare of every record? > I'd say give Lily a spin. Currently, we rely on Solr for search. In the next few months, we'll take a good look at "HBase-native" secondary indexes as well. Lily can be found at www.lilyproject.org. Thanks, Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily
