There is no "double storage" of data - the Solr index for DataStax Enterprise ignores the "stored" attribute and only stores the primary key data to allow the Solr document to reference the Cassandra row, which is where the data is stored. The exception would be doc values, where the data does need to be kept in the index for efficient operation of Lucene and Solr, but that would only be done for fields such as facet fields and is under the complete control of the developer.

DataStax Enterprise also utilizes an indexing queue so that Cassandra inserts and updates can occur at full speed, with indexing in a background thread, maximizing ingestion performance.

-- Jack Krupansky

-----Original Message----- From: andrey prokopenko
Sent: Friday, November 7, 2014 5:00 AM
To: solr-user@lucene.apache.org
Subject: Re: on regards to Solr and NoSQL storages integration

Thanks for the reply. I've considered DataStax, but dropped it first due to
the commercial model they're using and second due to the integration model
they have chosen to integrate with Cassandra. In their docs (can be found
here:
http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_load_data),
they do not disclose the architecture and details of their integration
solution, yet the examination of the Solr configuration and handlers from
their distribution package has revealed that they essentially let the docs
rest both in Solr index and Cassandra storage. To safely propagating
documents on  each Solr index update to Casssandra they use their own
update handler + custom update log.
In my opinion, this is not very efficient, because it doubles docs storage
and leaves Solr index as heavy as it is currently. My approach completely
relays stored fields storage to NoSQL database, using user-defined key
unique key. This gives the users quickly do partial updates of stored but
non-indexed non-indexed fields and greatly reduces time required to
replication in case of heavy write/load.

On Wed, Nov 5, 2014 at 4:04 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

On 5 November 2014 08:52, andrey prokopenko <andrey4...@gmail.com> wrote:
> I assume, there might be other developers, trying to solve similar
> problems, so I'd be interested to hear about similar attempts & issues
> encountered while trying to implement such an integration between Solr
and
> other NoSQL databases.

I think DataStax does Solr+Cassandra and Cloudera does Solr+Hadoop
with underlying content stored in the databases. Also Neo4J has
graph+search integration, but I think it's directly using Lucene
engine, not Solr.

Disclaimer: this is very high level understanding, hopefully the other
people can confirm.

Regards,
   Alex.

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


Reply via email to