[MediaWiki-commits] [Gerrit] operations/puppet[production]: Report partial result from mwgrep
Gehel has submitted this change and it was merged. Change subject: Report partial result from mwgrep .. Report partial result from mwgrep mwgrep can currently return partial results by hitting the max_inspect limit but doesn't tell the user anything about it (because elasticsearch doesn't tell us it hit the max_inspect limit). Rather than using an arbitrary document limit use the timeout to restrict how long a query can run. When the query hits a timeout inform the user. The difference between timeout based total results and the old max_inspect can be seen with this example in prod. A short timeout may need to be added to trigger the partial results path. mwgrep --user '[a-z]*' Bug: T127788 Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b --- M modules/scap/files/mwgrep 1 file changed, 16 insertions(+), 2 deletions(-) Approvals: Gehel: Looks good to me, approved DCausse: Looks good to me, but someone else must approve jenkins-bot: Verified diff --git a/modules/scap/files/mwgrep b/modules/scap/files/mwgrep index c277598..0b4ce44 100755 --- a/modules/scap/files/mwgrep +++ b/modules/scap/files/mwgrep @@ -101,8 +101,8 @@ 'regex': args.term, 'field': 'source_text', 'ngram_field': 'source_text.trigram', -'max_inspect': 1, 'max_determinized_states': 2, +'max_expand': 10, 'case_sensitive': True, 'locale': 'en', }}, @@ -129,7 +129,8 @@ uri = BASE_URI + '?' + urllib.urlencode(query) try: req = urllib2.urlopen(uri, json.dumps(search)) -result = json.load(req)['hits'] +full_result = json.load(req) +result = full_result['hits'] private_wikis = open('/srv/mediawiki/dblists/private.dblist').read().splitlines() @@ -156,6 +157,19 @@ print('') print('(total: %s, shown: %s)' % (result['total'], len(result['hits']))) +if full_result['timed_out']: +print(""" +The query was unable to complete within the alloted time. Only partial results +are shown here, and the reported total hits is <= the true value. To speed up +the query: + +* Ensure the regular expression contains one or more sets of 3 contiguous + characters. A character range ([a-z]) won't be expanded to count as + contiguous if it matches more than 10 characters. +* Use a simpler regular expression. Consider breaking the query up into + multiple queries where possible. +""") + except urllib2.HTTPError, error: try: error_body = json.load(error) -- To view, visit https://gerrit.wikimedia.org/r/307652 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b Gerrit-PatchSet: 3 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: EBernhardsonGerrit-Reviewer: DCausse Gerrit-Reviewer: EBernhardson Gerrit-Reviewer: Gehel Gerrit-Reviewer: jenkins-bot <> ___ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
[MediaWiki-commits] [Gerrit] operations/puppet[production]: Report partial result from mwgrep
EBernhardson has uploaded a new change for review. https://gerrit.wikimedia.org/r/307652 Change subject: Report partial result from mwgrep .. Report partial result from mwgrep mwgrep can currently return partial results by hitting the max_inspect limit but doesn't tell the user anything about it (because elasticsearch doesn't tell us it hit the max_inspect limit). Rather than using an arbitrary document limit use the timeout to restrict how long a query can run. When the query hits a timeout inform the user. The difference between timeout based total results and the old max_inspect can be seen with this example in prod. A short timeout may need to be added to trigger the partial results path. mwgrep --user '[a-z]*' Bug: T127788 Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b --- M modules/scap/files/mwgrep 1 file changed, 14 insertions(+), 2 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/operations/puppet refs/changes/52/307652/1 diff --git a/modules/scap/files/mwgrep b/modules/scap/files/mwgrep index c277598..89845d0 100755 --- a/modules/scap/files/mwgrep +++ b/modules/scap/files/mwgrep @@ -101,7 +101,6 @@ 'regex': args.term, 'field': 'source_text', 'ngram_field': 'source_text.trigram', -'max_inspect': 1, 'max_determinized_states': 2, 'case_sensitive': True, 'locale': 'en', @@ -129,7 +128,8 @@ uri = BASE_URI + '?' + urllib.urlencode(query) try: req = urllib2.urlopen(uri, json.dumps(search)) -result = json.load(req)['hits'] +full_result = json.load(req) +result = full_result['hits'] private_wikis = open('/srv/mediawiki/dblists/private.dblist').read().splitlines() @@ -156,6 +156,18 @@ print('') print('(total: %s, shown: %s)' % (result['total'], len(result['hits']))) +if full_result['timed_out']: +print(""" +The query was unable to complete within the alloted time. Only partial results +are shown here, and the reported total hits is <= the true value. To speed up the query: + +* Ensure the regular expression contains one or more sets of 3 contiguous + characters. A character range ([a-z]) won't be expanded to count as contiguous + if it matches more than 3 characters. +* Use a simpler regular expression where possible. Consider breaking the query up + into multiple queries if necessary. +""") + except urllib2.HTTPError, error: try: error_body = json.load(error) -- To view, visit https://gerrit.wikimedia.org/r/307652 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b Gerrit-PatchSet: 1 Gerrit-Project: operations/puppet Gerrit-Branch: production Gerrit-Owner: EBernhardson___ MediaWiki-commits mailing list MediaWiki-commits@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits