You're basically re-implementing Solr' cursors.

you can change your system of reading docs from the old collection to 
use...

cursorMark=*&sort=timestamp+asc,id+asc

...and then instead of keeping track of the last timestamp & id values and 
constructing a filter, you can just keep track of the nextCursorMark and 
pass it the next time you want to check for newer documents...

https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results





: Date: Mon, 21 Sep 2015 21:32:33 +0300
: From: Gili Nachum <gilinac...@gmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: Re: How can I get a monotonically increasing field value for docs?
: 
: Thanks for the indepth explanation!
: 
: The secondary sort by uuid would allow me to read a series of docs with
: identical time over multiple batches by specifying filtering
: time>timeOnLastReadDoc or (time=timeOnLastReadDoc and
: uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to
: track progress over.
: On Sep 21, 2015 19:56, "Shawn Heisey" <apa...@elyograg.org> wrote:
: 
: > On 9/21/2015 9:01 AM, Gili Nachum wrote:
: > > TimestampUpdateProcessorFactory takes place only on the leader shard, or
: > on
: > > each shard replica?
: > > if on each replica then I would get different values on each replica.
: > >
: > > My alternative would be to perform secondary sort on a UUID to ensure
: > order.
: >
: > If the update chain is configured properly, it runs on the leader, so
: > all replicas get the same timestamp.
: >
: > Without SolrCloud, the way to create an "indexed at" time field is in
: > the schema -- specify a default value of NOW on the field definition and
: > don't send the field when indexing.  The old master/slave replication
: > copies the actual index contents, so the indexed values in all replicas
: > are the same.
: >
: > The problem with NOW in the schema when running SolrCloud is that each
: > replica indexes the document independently, so each replica can have a
: > different timestamp.  This is why the timestamp update processor exists
: > -- to set the timestamp to a specific value before the document is
: > duplicated to each replica, eliminating the problem.
: >
: > FYI, secondary sort parameters affect the order when the primary sort
: > field is identical between two documents.  It may not do what you are
: > intending because of that.
: >
: > Thanks,
: > Shawn
: >
: >
: 

-Hoss
http://www.lucidworks.com/

Reply via email to