Re: Lucene consistency in clustered environment

Dennis van der Laan Thu, 08 Sep 2011 00:29:59 -0700

On 1-9-2011 23:22, Jeroen Reijn wrote:
> On Wed, Aug 31, 2011 at 3:16 PM, Dennis van der Laan
> <[email protected]> wrote:
>> Ian, others,
>>
>> As with many 'bugs' that have a workaround, this bug has been lying
>> around for about a year now. We still have the problem that the
>> cluster-nodes have different lucene indexes. At first, we thought this
>> happened over time. Recently we made a copy of our production database
>> and used it with 4 new cluster nodes (we cleared the journal table and
>> the local revisions table, first). We started them all, completely
>> clean, at which point all nodes started to build the lucene index.
>> Without making any changes to the contents, we see different results for
>> jackrabbit search queries on these 4 cluster nodes. So it seems the
>> lucene indexes might differ more over time, but could differ right from
>> the start.
>>
>> Does anybody have a clue how this could happen? Are we missing something?
> I'm wondering what you mean with the statement: "different results for
> jackrabbit search queries".
When doing a fulltext search (xpath query with a 'contains' clause), on
some cluster nodes a document containing the queried text might show up
in the results, whereas on other cluster nodes it may not. When we
update such a document so it gets indexed again on all cluster nodes
(hopefully), it may show up on all cluster nodes again. I do not have
numbers on how many documents are not indexed on all cluster nodes, but
happened too often to speak of 'an incident'.
> Could you perhaps show some of those queries? This could also be
> related to your indexing configuration.
I don't quite understand what you mean with 'related to your indexing
configuration'. We roll out our cluster nodes from a single templating
server, so the configuration for all cluster nodes is exactly the same,
except for the cluster id.
An example of a query with might not return the same results on all
cluster nodes:


/jcr:root/cms/documents//element(*,
nt:file)/jcr:content[(jcr:contains(cms:searchData/@cms:title, 'academy
assistent') or jcr:contains(@jcr:data, 'academy assistent')) and
(@cms:type = 'article')]/(@jcr:lastModified|rep:excerpt()|@cms:type)
order by @cms:sortfield ascending
>
> I asume you do not have an index when starting one of the cluster nodes?
Not when we start a fresh cluster node for the first time, no.
>
> BTW which version of Jackrabbit are you experiencing this with?
We are currently still using Jackrabbit 1.6.1

Thanks for taking a look at our problem!
Best regards,
Dennis
>
>> TIA
>> Dennis
>>
>> On 29-9-2010 12:37, Ian Boston wrote:
>>> On 29 Sep 2010, at 11:33, Dennis van der Laan wrote:
>>>
>>>> From your reply I
>>>> understand that this should not be the case with Lucene, is it?
>>> Every JournalRecord should have been replayed on every machine (at some 
>>> time later if the JVM was down). That *should* ensure that all documents 
>>> are indexed on all machines.
>>> Sounds like this is not happening in your environment.
>>>
>>> Ian
>>>
>>
>> --
>> Dennis van der Laan, MSc
>> Centre for Information Technology
>> University of Groningen
>>
>>
>
>


-- 
Dennis van der Laan, MSc
Centre for Information Technology
University of Groningen

Re: Lucene consistency in clustered environment

Reply via email to