Thanks for the indepth explanation! The secondary sort by uuid would allow me to read a series of docs with identical time over multiple batches by specifying filtering time>timeOnLastReadDoc or (time=timeOnLastReadDoc and uuid>uuidOnLastReaDoc) which essentially creates a unique sorted value to track progress over. On Sep 21, 2015 19:56, "Shawn Heisey" <apa...@elyograg.org> wrote:
> On 9/21/2015 9:01 AM, Gili Nachum wrote: > > TimestampUpdateProcessorFactory takes place only on the leader shard, or > on > > each shard replica? > > if on each replica then I would get different values on each replica. > > > > My alternative would be to perform secondary sort on a UUID to ensure > order. > > If the update chain is configured properly, it runs on the leader, so > all replicas get the same timestamp. > > Without SolrCloud, the way to create an "indexed at" time field is in > the schema -- specify a default value of NOW on the field definition and > don't send the field when indexing. The old master/slave replication > copies the actual index contents, so the indexed values in all replicas > are the same. > > The problem with NOW in the schema when running SolrCloud is that each > replica indexes the document independently, so each replica can have a > different timestamp. This is why the timestamp update processor exists > -- to set the timestamp to a specific value before the document is > duplicated to each replica, eliminating the problem. > > FYI, secondary sort parameters affect the order when the primary sort > field is identical between two documents. It may not do what you are > intending because of that. > > Thanks, > Shawn > >