Online text search with Hadoop/Brisk

2011-05-11 Thread Ben Scholl
I keep reading that Hadoop/Brisk is not suitable for online querying, only
for offline/batch processing. What exactly are the reasons it is unsuitable?
My use case is a fairly high query load, and each query ideally would return
within about 20 seconds. The queries will use indexes to narrow down the
result set first, but they also need to support text search on one of the
fields. I was thinking of simulating the SQL LIKE statement, by running each
query as a MapReduce job so that the text search gets distributed between
nodes.

I know the recommended approach is to keep a seperate full-text index, but
that could be quite space-intensive, and also means you can only search on
complete words. Any thoughts on this approach?

Thanks,

Ben


Re: Online text search with Hadoop/Brisk

2011-05-11 Thread Edward Capriolo
On Wed, May 11, 2011 at 11:19 AM, Ben Scholl brsch...@gmail.com wrote:
 I keep reading that Hadoop/Brisk is not suitable for online querying, only
 for offline/batch processing. What exactly are the reasons it is unsuitable?
 My use case is a fairly high query load, and each query ideally would return
 within about 20 seconds. The queries will use indexes to narrow down the
 result set first, but they also need to support text search on one of the
 fields. I was thinking of simulating the SQL LIKE statement, by running each
 query as a MapReduce job so that the text search gets distributed between
 nodes.
 I know the recommended approach is to keep a seperate full-text index, but
 that could be quite space-intensive, and also means you can only search on
 complete words. Any thoughts on this approach?
 Thanks,
 Ben

Brisk was made to me a tight integration of Cassandra Hadoop and Hive.

If you are looking to full text searches you should look at Solandra,
https://github.com/tjake/Solandra, which is an Cassandra backend for
the Solr/Lucene indexes.

Edward