Re: Solr 4.7.0 - cursorMark question

Greg Pendlebury Sun, 09 Mar 2014 11:13:32 -0700

That was really clear; I just had another read through of the documentation
with that explanation in mind and I can see I went off the rails.


Sorry for any confusion on my part, and thanks for the details.

Ta,
Greg


On 8 March 2014 08:36, Chris Hostetter <hossman_luc...@fucit.org> wrote:

>
> : Thank-you, that all sounds great. My assumption about documents being
> : missed was something like this:
>         ...
> : In that situation D would always be missed, whether the cursorMark 'C or
> : greater' or 'greater than B' (I'm not sure which it is in practice),
> simply
> : because the cursorMark is the unique ID and the unique ID is not your
> first
> : sort mechanism.
>
> First off: nothing about your example would result in "the cursorMark is
> the unique ID" ... let's clear that misconception up right away:
>
> Using Cursors requires a deterministic sort w/o any "ties" that can result
> in abiguity.  For this reason (eliminating the abiguity) it is neccessary
> that the uniqueKey always be included in a sort -- but the cursorMark
> values that get computed are determined by *all* of the sort critera used.
>
> So let's revisit your example, but let's make sure we are explicit about
> everything involved:
>
>  * A,B,C,D are all uniqueyKey values in the "id" field
>  * 1,2,3.... are all time values in a "timestamp" field.
>  * we're going to use a "sort=timestamp asc, id asc" param in this example
>  * when we say "X(123)" we mean "Document with id 'X' which currently has
>    value '123' in the timestamp field"
>
> Let's suppose that at the start of the example, all of the docs in your
> example, in sorted order, look like this...
>
>   A(1), B(3), C(14), D(32)
>
> A client uses our sort, along with cursorMark=* & rows=2.  That client
> will get back A(1) and B(3) as well as some nextCursorMark value of "$%^"
> (deliberately not using any letters or numbers so as not to misslead you
> ito thinking hte cursorMark value is an id or a timestamp -- it's
> neaither, it's an encoded binary value that has no meaning to client other
> then as a "mark" to send back to the server)
>
> Now let's suppose that B & C are edited as you mention -- their new
> timestamp values must -- by definition -- be greater then D's existing
> timestamp value of "32" (otherwise it's not really a timestamp field) So
> let's assume now, that the total ordering of all our docs, using our sort
> is:
>
>   A(1), D(32), B(56), C(57)
>
> After B & C are modified, the the client makes a followup request using
> the same sort, rows=2, and cursorMark=$%^ (the nextCursorMark returned
> from the previous request)  the two documents the client will get this
> time are D(32) and B(56).
>
>  - "D" will never be skipped.
>  - "B" will be returned twice, because it's timestamp
>    value was updated after it was fetched
>
> Does that make sense?
>
> You can try this out manually if you want to see it for yourlself --
> either using a "real" auto-assigned timestamp field, or just using a
> simple numeric field you set your self when updating docs.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Solr 4.7.0 - cursorMark question

Reply via email to