https://bugzilla.wikimedia.org/show_bug.cgi?id=72128
Bug ID: 72128
Summary: CirusSearch: Accelerated regex searches that stop
early do not signal that
Product: MediaWiki extensions
Version: unspecified
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: CirrusSearch
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected], [email protected],
[email protected]
Web browser: ---
Mobile Platform: ---
In order to keep load down on the search cluster accelerated regex searches are
only allowed to recheck a limited number of documents (10,000 right now).
Right now when that limit is reached all subsequent documents are considered
not to match and Cirrus doesn't signal the user at all that this happened.
This means that results are less reliable. OTOH this should only happen if
your regex can't be accelerated down to a small subset of the wiki which
_should_ be reasonably rare. It'd happen if the regex actually does match more
then the recheck limit or if it is specific but the trigram that we're able to
extract from it still matches too many documents.
Example:
insource:/ {{/ will match a ton of pages and under report the number
insource:/ {{..ca/ will match fewer pages but the only trigram that can be
extracted from (" {{") is still on too many pages
The plan is to allow the recheck code to signal back to cirrus that it gave up
so it can let the user know that the results may not be consistent and it can
tell them how to fix their regex. Unfortunately that first level of signalling
requires Elasticsearch 1.4 which isn't quite released yet.
--
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l