Thank you, everybody. I summarized your advice here,
http://shmsoft.blogspot.com/2011/06/search-in-ediscovery.html, because I
need it for my open source eDiscovery, and now just need to try it all :)

Sincerely,
Mark

On Mon, Jun 6, 2011 at 11:18 AM, Buttler, David <[email protected]> wrote:

> I store over 500M documents in HBase, and index using Solr with dynamic
> fields.  This gives you tremendous flexibility to do the type of queries you
> are looking for -- and to make them simple and intuitive via a faceted
> interface.
>
> However, there was quite a bit of software that we had to write to get
> things going, and I can neither release all of it open source, or support
> other people using it.  If I had to start again, I would seriously look at
> solutions like elastic search and lily.
>
> Dave
>
> -----Original Message-----
> From: Mark Kerzner [mailto:[email protected]]
> Sent: Friday, June 03, 2011 5:57 PM
> To: HBase Discussion Group
> Subject: What's the best approach to search in HBase?
>
> Hi,
>
> I need to store, say, 10M-100M documents, with each document having say 100
> fields, like author, creation date, access date, etc., and then I want to
> ask questions like
>
> give me all documents whose author is like abc**, and creation date any
> time
> in 2010 and access date in 2010-2011, and so on, perhaps 10-20 conditions,
> matching a list of some keywords.
>
> What's best, Lucene, Katta, HBase CF with secondary indices, or plain scan
> and compare of every record?
>
> Thanks a bunch!
>
> Mark
>

Reply via email to