On Mon, Jul 6, 2009 at 7:43 AM, Andrew Garrett<[email protected]> wrote: > Yes. > > We wouldn't allow direct searching from the web interface with regexes > for two related reasons: > > 1/ A single search with the most basic of regexes would take several > minutes, if not hours. It isn't computationally trivial to search for > a small string in a very large string of over 10 GB, let alone a > regex. Words can be indexed, regexes cannot. > > 2/ Even if we could find a way to make the former performant, a > malicious regex could significantly expand this time taken, leading to > a denial of service.
I seem to recall Gregory Maxwell describing a setup that made this feasible, given the appropriate amount of dedicated hardware. It was run with the entire database in memory; it only permitted "real" regular expressions (compilable to finite-state machines, no backreferences etc.); and it limited the length of the finite-state machine generated. Running a regex took several minutes, but he'd run a lot of them in parallel, since it was mostly memory-bound, so he got fairly good throughout. Something like that. But probably not practical without an undue amount of effort and hardware, yeah. :) _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
