Hi,

We currently have a single Solr server, with a single index. There are
a number of CMS processes distributed over a number of servers, with
each CMS process sending an update to the Solr index when changes are
made to a content object.

My concern is that a scenario is possible where a content object is
changed and reindexed concurrently by two CMS processes. The database
ensures consistency within the CMS, these transactions get comitted as
T1 and T2. But I cannot see how to ensure that the reindexing
operations (that result in a delete and add for the document) are
processed in the order R1 then R2, rather than R2 then R1. In the
second case the index record is now inconsistent with the content
object in the database.

I would like to supply a transaction id with the reindex request, and
configure Solr such that a reindex operation is processed if and only
if the supplied transaction id is greater than the currently indexed
transaction id.

Otherwise the only way I can see to guarantee consistency is 1) have
index operations processed by a single writer, or 2) commit the index
operation between database prepare and commit statements.

The first is not desirable as we introduce a single point of failure
(in addition to the single Solr server) and delay updating the index.
The second is not desirable because it reduces the throughput of the
database, and with a distributed Solr setup would not solve the
problem.

>From what I can tell this conditional indexing feature is not
supported by Solr. Might it be supported by Lucene but not exposed by
Solr?

Thanks,

Laurence

2008/12/4 Shalin Shekhar Mangar <[EMAIL PROTECTED]>:
> It is not clear how you are using Solr i.e. distributed vs single index.
>
> Summarily, Solr does not update documents. It overwrites the old document
> with the new one if an old document with the same uniqueKey exists in the
> index.
>
> Does that answer your question?
>
> On Thu, Dec 4, 2008 at 1:46 AM, Laurence Rowe <[EMAIL PROTECTED]> wrote:
>
>> Hi,
>>
>> Our CMS is distributed over a cluster and I was wandering how I can
>> ensure that index records of newer versions of documents are never
>> overwritten by older ones. Amazon AWS uses a timestamp on requests to
>> ensure 'eventual consistency' of operations. Is there a way to supply
>> a transaction ID with an update so an update is conditional on the
>> supplied transaction id being greater than the existing indexed
>> transaction id?
>>
>> Laurence
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Reply via email to