https://bugzilla.wikimedia.org/show_bug.cgi?id=72381

Nik Everett <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #1 from Nik Everett <[email protected]> ---
Technically this is already implemented with Cirrus's source regex queries but
its super duper slow in production now.  Right now the default implementation
is to brute force run the regex over all the pages.  That takes, like, 10
minutes on enwiki if you can't reduce the set of considered pages some other
way (title filter, other required text, smaller namespace, etc).  After about a
minute of waiting on the search varnish normally chops the request and sends
you a timeout which is pretty lame.  So 10 minutes of compute time get wasted
(kinda, we mitigate it a bit but it still lame).

Anyway, we're in the process of deploying trigram accelerated regex searches so
we only actually have to run the regexes on pages that have a chance of
matching the regex in the first place.  In the common case its something like
60 times faster than the brute force.  10 seconds is ok to wait if not great. 
In the worst case we actually cut the query off at some point and don't let it
take any more time.  This can cause weird results (Bug 72128) but at least you
get results at all rather than waiting forever.

The trigram searches aren't the default because we haven't built the trigram
index for all the wikis.  The plan is to make it the default once the trigram
index is built for all the wikis which will take another few days.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to