On 11/28/2019 1:30 AM, Dwane Hall wrote:
I asked a question on the forum a couple of weeks ago regarding cursorMark 
duplicates.  I initially thought it may be due to HDFSCaching because I was 
unable replicate the issue on local indexes but unfortunately the dreaded 
duplicates have returned!! For a refresher I was seeing what I thought was 
duplicate documents appearing randomly on the last page of one cursor, and the 
first page of the next.  So if rows=50 the duplicates are document 50 on page 1 
and document 1 on page 2.

After further investigation I don't actually believe these documents are duplicates but 
the same document being returned from a different replica on each page.  After running a 
diff on the two documents the only difference is the field "Solr_Update_Date" 
which I insert on each document as it is inserted into the corpus.

This is how the managed-schema mapping for this field looks

<field name="Solr_Update_Date" type="rdate" indexed="true" stored="true" 
default="NOW" />
This can happen with SolrCloud using NRT replicas. The default replica type is NRT. Based on the core names returned by the [shard] field in your responses, it looks like you do have NRT replicas.

There are two solutions. The better solution is to use TimestampUpdateProcessorFactory for setting your timestamp field instead of a default of NOW in the schema. An alternate solution is to use TLOG/PULL replica types instead of NRT -- that way replicas are populated by copying exact index contents instead of independently indexing.

Thanks,
Shawn

Reply via email to