On Friday, 30 de September de 2011 12:57:28 Kent Hansen wrote:
> Hi,
> You might have seen Thiago's blog about QStringLiteral [1], and his idea
> on replacing QLatin1String usage by QStringLiteral in Qt (where possible).
> 
> I like the idea, but wanted to do some benchmarking first to get an
> impression of the performance impact. [2]
> 
> My results so far (on Linux 32-bit) indicate that QString::appends are
> way faster when switching to using QStringLiteral: 7x faster than
> QLatin1String for a 2-character literal and 14x for a ~50-character literal.

Not unexpected. The conversion from Latin 1 to UTF-16 required for the 
appending needs to be done at runtime for a QLatin1String, whereas the 
compiler does it with QStringLiteral. 

And that's assuming you didn't get an implicit conversion to QString 
somewhere. QString::append has a QLatin1String overload, but some methods 
don't and those cause a temporary to be created, which involves a malloc (non-
deterministic time).

> Now, the not-so-good news: operator==(QString) is a bit (just a bit)
> slower than operator==(QLatin1String) for short strings.
> It seems that, for short strings, the overhead of calling qMemEquals()
> and performing its "housecleaning chores" outweigh the benefits of its
> fast comparison loop.

Sounds like the result I found in the investigation of using SIMD in QString: 
the best algorithm is the least complex possible. There are some better 
algorithms for qMemEquals in tests/benchmarks/corelib/tools/qstring.cpp (the 
ucstrncmp functions). But as the comment above qMemEquals shows, the 
performance varies a lot depending on the architecture.

> In other words, if someone were to optimize QString::operator==(QString)
> to perform better for small strings, the total replacement would be a
> done deal.

The code is already there. Just replace qMemEquals with the contents of 
ucstrncmp_sse2, but you need to keep the generic code for other architectures. 
They'll still benefit on the unrolling of the loop for small strings (less than 
8 characters).

As for Neon optimisation, the lack of a "movemask" instruction like SSE2 makes 
it very hard to produce optimal code. If you look at fromUtf8_neon, you'll see 
you need to execute two Neon instructions, two comparisons and then rbit and 
clz. If anyone wants to try this, be my guest. I won't be doing any more Neon 
optimisations :-)

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Reply via email to