[MediaWiki-commits] [Gerrit] operations/puppet[production]: Report partial result from mwgrep

2016-09-09 Thread Gehel (Code Review)
Gehel has submitted this change and it was merged.

Change subject: Report partial result from mwgrep
..


Report partial result from mwgrep

mwgrep can currently return partial results by hitting the max_inspect
limit but doesn't tell the user anything about it (because elasticsearch
doesn't tell us it hit the max_inspect limit). Rather than using an
arbitrary document limit use the timeout to restrict how long a query
can run. When the query hits a timeout inform the user. The difference
between timeout based total results and the old max_inspect can be
seen with this example in prod. A short timeout may need to be added
to trigger the partial results path.

  mwgrep --user '[a-z]*'

Bug: T127788
Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b
---
M modules/scap/files/mwgrep
1 file changed, 16 insertions(+), 2 deletions(-)

Approvals:
  Gehel: Looks good to me, approved
  DCausse: Looks good to me, but someone else must approve
  jenkins-bot: Verified



diff --git a/modules/scap/files/mwgrep b/modules/scap/files/mwgrep
index c277598..0b4ce44 100755
--- a/modules/scap/files/mwgrep
+++ b/modules/scap/files/mwgrep
@@ -101,8 +101,8 @@
 'regex': args.term,
 'field': 'source_text',
 'ngram_field': 'source_text.trigram',
-'max_inspect': 1,
 'max_determinized_states': 2,
+'max_expand': 10,
 'case_sensitive': True,
 'locale': 'en',
 }},
@@ -129,7 +129,8 @@
 uri = BASE_URI + '?' + urllib.urlencode(query)
 try:
 req = urllib2.urlopen(uri, json.dumps(search))
-result = json.load(req)['hits']
+full_result = json.load(req)
+result = full_result['hits']
 
 private_wikis = 
open('/srv/mediawiki/dblists/private.dblist').read().splitlines()
 
@@ -156,6 +157,19 @@
 
 print('')
 print('(total: %s, shown: %s)' % (result['total'], len(result['hits'])))
+if full_result['timed_out']:
+print("""
+The query was unable to complete within the alloted time. Only partial results
+are shown here, and the reported total hits is <= the true value. To speed up
+the query:
+
+* Ensure the regular expression contains one or more sets of 3 contiguous
+  characters. A character range ([a-z]) won't be expanded to count as
+  contiguous if it matches more than 10 characters.
+* Use a simpler regular expression. Consider breaking the query up into
+  multiple queries where possible.
+""")
+
 except urllib2.HTTPError, error:
 try:
 error_body = json.load(error)

-- 
To view, visit https://gerrit.wikimedia.org/r/307652
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b
Gerrit-PatchSet: 3
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: EBernhardson 
Gerrit-Reviewer: DCausse 
Gerrit-Reviewer: EBernhardson 
Gerrit-Reviewer: Gehel 
Gerrit-Reviewer: jenkins-bot <>

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits


[MediaWiki-commits] [Gerrit] operations/puppet[production]: Report partial result from mwgrep

2016-08-30 Thread EBernhardson (Code Review)
EBernhardson has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/307652

Change subject: Report partial result from mwgrep
..

Report partial result from mwgrep

mwgrep can currently return partial results by hitting the max_inspect
limit but doesn't tell the user anything about it (because elasticsearch
doesn't tell us it hit the max_inspect limit). Rather than using an
arbitrary document limit use the timeout to restrict how long a query
can run. When the query hits a timeout inform the user. The difference
between timeout based total results and the old max_inspect can be
seen with this example in prod. A short timeout may need to be added
to trigger the partial results path.

  mwgrep --user '[a-z]*'

Bug: T127788
Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b
---
M modules/scap/files/mwgrep
1 file changed, 14 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/operations/puppet 
refs/changes/52/307652/1

diff --git a/modules/scap/files/mwgrep b/modules/scap/files/mwgrep
index c277598..89845d0 100755
--- a/modules/scap/files/mwgrep
+++ b/modules/scap/files/mwgrep
@@ -101,7 +101,6 @@
 'regex': args.term,
 'field': 'source_text',
 'ngram_field': 'source_text.trigram',
-'max_inspect': 1,
 'max_determinized_states': 2,
 'case_sensitive': True,
 'locale': 'en',
@@ -129,7 +128,8 @@
 uri = BASE_URI + '?' + urllib.urlencode(query)
 try:
 req = urllib2.urlopen(uri, json.dumps(search))
-result = json.load(req)['hits']
+full_result = json.load(req)
+result = full_result['hits']
 
 private_wikis = 
open('/srv/mediawiki/dblists/private.dblist').read().splitlines()
 
@@ -156,6 +156,18 @@
 
 print('')
 print('(total: %s, shown: %s)' % (result['total'], len(result['hits'])))
+if full_result['timed_out']:
+print("""
+The query was unable to complete within the alloted time. Only partial results
+are shown here, and the reported total hits is <= the true value. To speed up 
the query:
+
+* Ensure the regular expression contains one or more sets of 3 contiguous
+  characters. A character range ([a-z]) won't be expanded to count as 
contiguous
+  if it matches more than 3 characters.
+* Use a simpler regular expression where possible. Consider breaking the query 
up
+  into multiple queries if necessary.
+""")
+
 except urllib2.HTTPError, error:
 try:
 error_body = json.load(error)

-- 
To view, visit https://gerrit.wikimedia.org/r/307652
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id95ba3f8df1bca2e2f089525bf7aa061ddbc1e2b
Gerrit-PatchSet: 1
Gerrit-Project: operations/puppet
Gerrit-Branch: production
Gerrit-Owner: EBernhardson 

___
MediaWiki-commits mailing list
MediaWiki-commits@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits