Re: on regards to Solr and NoSQL storages integration

Jack Krupansky Sat, 08 Nov 2014 20:13:25 -0800

There is no "double storage" of data - the Solr index for DataStaxEnterprise ignores the "stored" attribute and only stores the primary keydata to allow the Solr document to reference the Cassandra row, which iswhere the data is stored. The exception would be doc values, where the datadoes need to be kept in the index for efficient operation of Lucene andSolr, but that would only be done for fields such as facet fields and isunder the complete control of the developer.

DataStax Enterprise also utilizes an indexing queue so that Cassandrainserts and updates can occur at full speed, with indexing in a backgroundthread, maximizing ingestion performance.


-- Jack Krupansky

-----Original Message-----From: andrey prokopenko

Sent: Friday, November 7, 2014 5:00 AM
To: solr-user@lucene.apache.org
Subject: Re: on regards to Solr and NoSQL storages integration

Thanks for the reply. I've considered DataStax, but dropped it first due to
the commercial model they're using and second due to the integration model
they have chosen to integrate with Cassandra. In their docs (can be found
here:
http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_load_data),
they do not disclose the architecture and details of their integration
solution, yet the examination of the Solr configuration and handlers from
their distribution package has revealed that they essentially let the docs
rest both in Solr index and Cassandra storage. To safely propagating
documents on  each Solr index update to Casssandra they use their own
update handler + custom update log.
In my opinion, this is not very efficient, because it doubles docs storage
and leaves Solr index as heavy as it is currently. My approach completely
relays stored fields storage to NoSQL database, using user-defined key
unique key. This gives the users quickly do partial updates of stored but
non-indexed non-indexed fields and greatly reduces time required to
replication in case of heavy write/load.

On Wed, Nov 5, 2014 at 4:04 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

On 5 November 2014 08:52, andrey prokopenko <andrey4...@gmail.com> wrote:
> I assume, there might be other developers, trying to solve similar
> problems, so I'd be interested to hear about similar attempts & issues
> encountered while trying to implement such an integration between Solr
and
> other NoSQL databases.

I think DataStax does Solr+Cassandra and Cloudera does Solr+Hadoop
with underlying content stored in the databases. Also Neo4J has
graph+search integration, but I think it's directly using Lucene
engine, not Solr.

Disclaimer: this is very high level understanding, hopefully the other
people can confirm.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

Re: on regards to Solr and NoSQL storages integration

Reply via email to