On Friday, 30 de September de 2011 12:57:28 Kent Hansen wrote: > Hi, > You might have seen Thiago's blog about QStringLiteral [1], and his idea > on replacing QLatin1String usage by QStringLiteral in Qt (where possible). > > I like the idea, but wanted to do some benchmarking first to get an > impression of the performance impact. [2] > > My results so far (on Linux 32-bit) indicate that QString::appends are > way faster when switching to using QStringLiteral: 7x faster than > QLatin1String for a 2-character literal and 14x for a ~50-character literal.
Not unexpected. The conversion from Latin 1 to UTF-16 required for the
appending needs to be done at runtime for a QLatin1String, whereas the
compiler does it with QStringLiteral.
And that's assuming you didn't get an implicit conversion to QString
somewhere. QString::append has a QLatin1String overload, but some methods
don't and those cause a temporary to be created, which involves a malloc (non-
deterministic time).
> Now, the not-so-good news: operator==(QString) is a bit (just a bit)
> slower than operator==(QLatin1String) for short strings.
> It seems that, for short strings, the overhead of calling qMemEquals()
> and performing its "housecleaning chores" outweigh the benefits of its
> fast comparison loop.
Sounds like the result I found in the investigation of using SIMD in QString:
the best algorithm is the least complex possible. There are some better
algorithms for qMemEquals in tests/benchmarks/corelib/tools/qstring.cpp (the
ucstrncmp functions). But as the comment above qMemEquals shows, the
performance varies a lot depending on the architecture.
> In other words, if someone were to optimize QString::operator==(QString)
> to perform better for small strings, the total replacement would be a
> done deal.
The code is already there. Just replace qMemEquals with the contents of
ucstrncmp_sse2, but you need to keep the generic code for other architectures.
They'll still benefit on the unrolling of the loop for small strings (less than
8 characters).
As for Neon optimisation, the lack of a "movemask" instruction like SSE2 makes
it very hard to produce optimal code. If you look at fromUtf8_neon, you'll see
you need to execute two Neon instructions, two comparisons and then rbit and
clz. If anyone wants to try this, be my guest. I won't be doing any more Neon
optimisations :-)
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Qt5-feedback mailing list [email protected] http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
