On Tuesday, 11 de October de 2011 10:17:23 Mark wrote:
> Found something that is probably interesting for Thiago with his high
> performance blog posts -- i like those posts a lot btw -- :)
> I came across this link:
>
> http://blog.phusion.nl/2010/12/06/efficient-substring-searching/
> source: https://github.com/FooBarWidget/boyer-moore-horspool
>
> That (Boyer-Moore or Boyer-Moore-Horspool) is probably very interesting in
> speeding up string matching anywhere in Qt. If used in RE2 it would probably
> speed it up a lot as well. I don't have numbers nor did i test it.. Just
> assuming it ^_^

I believe that's the algorithm implemented in QStringMatcher, which
QString::indexOf uses only if the string is much larger than the substring
being searched.

In the case of QRegExp, I sat down with Lars and João yesterday to discuss a
bit and we talked about QRegExp. We don't know what to do with it because we
want to, at the same time:

 * move the current engine out
 * use a high-performance engine in QtCore
 * not increase the footprint of QtCore by too much
 * not restrict the platforms unnecessarily
 * avoid code duplication
 * avoid converting from UTF-16 to UTF-8 or, worse, local 8 bit

We're not going to get them all, that's for sure. On one hand, the V8 engine
is very performant, works on UTF-16 and avoids code duplication, but it
increases the footprint and restricts the platforms addressed. On the other,
PCRE is performant too, works almost everywhere and is small, but requires
UTF-8←→UTF-16 conversion.

I believe the standard WebKit has a PCRE engine inside, modified to work on
UTF-16. That's also an option, but it is code duplication and causes us to
have to maintain it.

So maybe the solution is a hybrid: dlopen V8 where it is available, fall back
to libpcre otherwise. And crash if none is found. That means using regexps
will cause a library to be loaded, one that can be as big as V8.

What does everyone think?

People used to say:
        You had a problem and you used regular expressions. Now you have two.

With Qt 5, that will be three. :-)

(But you deserve it if you're using regular expressions for trivial things)

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Reply via email to