https://bugzilla.wikimedia.org/show_bug.cgi?id=72128

            Bug ID: 72128
           Summary: CirusSearch: Accelerated regex searches that stop
                    early do not signal that
           Product: MediaWiki extensions
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: CirrusSearch
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected],
                    [email protected]
       Web browser: ---
   Mobile Platform: ---

In order to keep load down on the search cluster accelerated regex searches are
only allowed to recheck a limited number of documents (10,000 right now). 
Right now when that limit is reached all subsequent documents are considered
not to match and Cirrus doesn't signal the user at all that this happened. 
This means that results are less reliable.  OTOH this should only happen if
your regex can't be accelerated down to a small subset of the wiki which
_should_ be reasonably rare.  It'd happen if the regex actually does match more
then the recheck limit or if it is specific but the trigram that we're able to
extract from it still matches too many documents.

Example:
insource:/ {{/ will match a ton of pages and under report the number
insource:/ {{..ca/ will match fewer pages but the only trigram that can be
extracted from (" {{") is still on too many pages

The plan is to allow the recheck code to signal back to cirrus that it gave up
so it can let the user know that the results may not be consistent and it can
tell them how to fix their regex.  Unfortunately that first level of signalling
requires Elasticsearch 1.4 which isn't quite released yet.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to