You're basically re-implementing Solr' cursors.
you can change your system of reading docs from the old collection to use... cursorMark=*&sort=timestamp+asc,id+asc ...and then instead of keeping track of the last timestamp & id values and constructing a filter, you can just keep track of the nextCursorMark and pass it the next time you want to check for newer documents... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results : Date: Mon, 21 Sep 2015 21:32:33 +0300 : From: Gili Nachum <gilinac...@gmail.com> : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: How can I get a monotonically increasing field value for docs? : : Thanks for the indepth explanation! : : The secondary sort by uuid would allow me to read a series of docs with : identical time over multiple batches by specifying filtering : time>timeOnLastReadDoc or (time=timeOnLastReadDoc and : uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to : track progress over. : On Sep 21, 2015 19:56, "Shawn Heisey" <apa...@elyograg.org> wrote: : : > On 9/21/2015 9:01 AM, Gili Nachum wrote: : > > TimestampUpdateProcessorFactory takes place only on the leader shard, or : > on : > > each shard replica? : > > if on each replica then I would get different values on each replica. : > > : > > My alternative would be to perform secondary sort on a UUID to ensure : > order. : > : > If the update chain is configured properly, it runs on the leader, so : > all replicas get the same timestamp. : > : > Without SolrCloud, the way to create an "indexed at" time field is in : > the schema -- specify a default value of NOW on the field definition and : > don't send the field when indexing. The old master/slave replication : > copies the actual index contents, so the indexed values in all replicas : > are the same. : > : > The problem with NOW in the schema when running SolrCloud is that each : > replica indexes the document independently, so each replica can have a : > different timestamp. This is why the timestamp update processor exists : > -- to set the timestamp to a specific value before the document is : > duplicated to each replica, eliminating the problem. : > : > FYI, secondary sort parameters affect the order when the primary sort : > field is identical between two documents. It may not do what you are : > intending because of that. : > : > Thanks, : > Shawn : > : > : -Hoss http://www.lucidworks.com/