On Tuesday, 11 de October de 2011 10:17:23 Mark wrote: > Found something that is probably interesting for Thiago with his high > performance blog posts -- i like those posts a lot btw -- :) > I came across this link: > > http://blog.phusion.nl/2010/12/06/efficient-substring-searching/ > source: https://github.com/FooBarWidget/boyer-moore-horspool > > That (Boyer-Moore or Boyer-Moore-Horspool) is probably very interesting in > speeding up string matching anywhere in Qt. If used in RE2 it would probably > speed it up a lot as well. I don't have numbers nor did i test it.. Just > assuming it ^_^
I believe that's the algorithm implemented in QStringMatcher, which
QString::indexOf uses only if the string is much larger than the substring
being searched.
In the case of QRegExp, I sat down with Lars and João yesterday to discuss a
bit and we talked about QRegExp. We don't know what to do with it because we
want to, at the same time:
* move the current engine out
* use a high-performance engine in QtCore
* not increase the footprint of QtCore by too much
* not restrict the platforms unnecessarily
* avoid code duplication
* avoid converting from UTF-16 to UTF-8 or, worse, local 8 bit
We're not going to get them all, that's for sure. On one hand, the V8 engine
is very performant, works on UTF-16 and avoids code duplication, but it
increases the footprint and restricts the platforms addressed. On the other,
PCRE is performant too, works almost everywhere and is small, but requires
UTF-8←→UTF-16 conversion.
I believe the standard WebKit has a PCRE engine inside, modified to work on
UTF-16. That's also an option, but it is code duplication and causes us to
have to maintain it.
So maybe the solution is a hybrid: dlopen V8 where it is available, fall back
to libpcre otherwise. And crash if none is found. That means using regexps
will cause a library to be loaded, one that can be as big as V8.
What does everyone think?
People used to say:
You had a problem and you used regular expressions. Now you have two.
With Qt 5, that will be three. :-)
(But you deserve it if you're using regular expressions for trivial things)
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Qt5-feedback mailing list [email protected] http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
