Text searches and free form queries

Oleg Dulin Mon, 03 Sep 2012 06:26:13 -0700

Dear Distinguished Colleagues:

I need to add full-text search and somewhat free form queries to myapplication. Our data is made up of "items" that are stored in a singlecolumn family, and we have a bunch of secondary indices for look ups.An item has header fields and data fields, and the structure of theitems CF is a super column family with row-key being item's natural ID,super column for header, super column for data.

Our application is made up of a several redundant/load balanced serversall pointing at a Cassandra cluster. Our servers run embedded Jetty.

I need to be able to find items by a combination of field values.Currently I have an index for items by field value which worksreasonably well. I could also add support for data types and indexitems by fields of appropriate types, so we can do range queries onitems.

Ultimately, though, what we want is full text search with suggestionsand human language sensitivity. We want to search by date ranges, byfield values, etc. I did some homework on this topic, and here is whatI see as options:

1) Use an SQL database as a helper. This is rather clunky, not surewhat it gets us since just about anything that can be done in SQL canbe done in Cassandra with proper structures. Then the problem here alsois where am I going to get an open source database that can handle theworkload ? Probably nowhere, nor do I get natural language support.2) Each of our servers can index data using Lucene, but again we haveto come up with a clunky mechanism where either one of the servers doesthe indexing and results are replicated, or each server does its ownindexing.3) We can use Solr as is, perhaps with some small modifications it canrun within our server JVM -- since we already run embedded Jetty. Ilike this idea, actually, but I know that Solr indexing doesn't takeadvantage of Cassandra.4) Datastax Enterprise with search, presumably, supports Solr indexingof existing column families -- but for the life of me I couldn't figureout how exactly it does that. The Wikipedia example shows that Solr cancreate column families based on Solr schemas that I can then queryusing Cassandra itself (which is great) and supposedly I can modifythose column families directly and Solr will reindex them (which iseven better), but I am not sure how that fits into our server design.The other concern is locking in to a commercial product, something I amvery much worried about.

So, one possibility I can see is using Solr embedded within our ownserver solution but storing its indexes in the file system outside ofCassandra. This is not optimal, and maybe over time i can add my ownsupport for storing Solr index in Cassandra w/o relying on the Datastaxsolution.


In any case, what are your thoughts and experiences ?


Regards,
Oleg

Text searches and free form queries

Reply via email to