Re: SOLR Cursor Pagination Issue

Erick Erickson Mon, 28 Sep 2020 07:51:43 -0700

I said nothing about docId changing. _any_ sort criteria changing is an issue. 
You’re sorting by score. Well, as you index documents, the new docs change the 
values used to calculate scores for _all_ documents will change, thus changing 
the sort order and potentially causing unexpected results when using 
cursormark. That said, I don’t think you’re getting any different scores at all 
if you’re really searching for “(* AND *)", try returning score in the fl list, 
are they different?


You still haven’t given an example of the results you’re seeing that are 
unexpected. And my assumption is that you are seeing odd results when you call 
this query again with a cursorMark returned by a previous call. Or are you 
saying that you don’t think facet.query is returning the correct count? Be 
aware that Solr doesn’t support true Boolean logic, see: 
https://lucidworks.com/post/why-not-and-or-and-not/

There’s special handling for the form "fq=NOT something” to change it to 
"fq=*:* NOT something” that’s not present in something like "q=NOT something”. 
How that plays in facet.query I’m not sure, but try “facet.query=*:* NOT 
something” if the facet count is what the problem is.

l have no idea what you’re trying to accomplish with (* AND *) unless those are 
just placeholders and you put real text in them. That’s rather odd. *:* is 
“select everything”...

BTW, returning 10,000 docs is somewhat of an anti-pattern, if you really 
require that many documents consider streaming.

> On Sep 28, 2020, at 10:21 AM, vmakov...@xbsoftware.by wrote:
> 
> Hi, Erick
> 
> I have a python script that sends requests with CursorMark. This script 
> checks data against the following Expected series criteria:
> Collected series:
> Number of requests:
> Collected unique series:
> The request looks like this: 
> select?indent=off&defType=edismax&wt=json&facet.query={!key=NUM_DOCS}NOT 
> SERIES_ID:0&fq=NOT 
> SERIES_ID:0&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&facet.limit=-1&q=(*
>  AND *)&qf=all_text_stemming all_text&fq=facet_db_code:( "CN" 
> )&fq=-SERIES_CODE:( "TEST" )&fl=SERIES_ID&sort=score desc,docId 
> asc&bq=SERIES_STATUS:T^5&bq=KEY_SERIES_FLAG:1^5&bq=accuracy_name:0&bq=SERIES_STATUS:C^-30&rows=10000&cursorMark=*
> 
> DocId does not change during data update.During data updating process in 
> solrCloud skript returnd incorect Number of requests and Collected series.
> 
> Best,
> Vlad
> 
> 
> 
> Mon, 28 Sep 2020 08:54:57 -0400, Erick Erickson <erickerick...@gmail.com> 
> писал(а):
> 
>> Define “incorrect” please. Also, showing the exact query you use would be 
>> helpful.
>> That said, indexing data at the same time you are using CursorMark is not 
>> guaranteed do find all documents. Consider a sort with date asc, id asc. 
>> doc53 has a date of 2001 and you’re already returned the doc.
>> Next, you update doc53 to 2020. It now appears sometime later in the results 
>> due to the changed data. Or the other way, doc53 starts with 2020, and while 
>> your cursormark label is in 2010, you change doc53 to have a date of 2001. 
>> It will never be returned.
>> Similarly for anything else you change that’s relevant to the sort criteria 
>> you’re using.
>> CursorMark doesn’t remember _documents_, just, well, call it the fingerprint 
>> (i.e. sort criteria values) of the last document returned so far.
>> Best,
>> Erick
>>> On Sep 28, 2020, at 3:32 AM, vmakov...@xbsoftware.by wrote:
>>> Good afternoon,
>>> Could you please suggest us a solution: during data updating process in 
>>> solrCloud, requests with cursor mark return incorrect data. I suppose that 
>>> the results do not follow each other during the indexation process, because 
>>> the data doesn't have enough time to be replicated between the nodes.
>>> Kind regards,
>>> Vladislav Makovski
> Vladislav Makovski
> Developer
> XB Software Ltd. | Minsk, Belarus
> Site: https://xbsoftware.com
> Skype: vlad__makovski
> Cell:  +37529 6484100

Re: SOLR Cursor Pagination Issue

Reply via email to