On Tuesday, 11 de October de 2011 16:05:42 Sylvain Pointeau wrote: > > An UTF16 API is required for the engine though, as all QStrings are UTF16. > > How much does it cost to convert UTF-16 to UTF-8 ? > Is it really a show-stopper for choosing PCRE?
It costs infinitely more to convert to UTF-8 than to do nothing. The conversion
takes non-zero time and the non-conversion takes no time at all. Division by
zero.
We could optimise the UTF-8 encoder -- in fact, I already have in the QUrl
refactor work because I needed that.
What worries me more is to extract offset information. Suppose the following
code:
int pos = regexp.indexIn(str);
pcre_exec will return an offset of the match start and end for the whole
match, as well as a pair of integers for each capture. Note what the manual
says (pcreapi(3)):
Note: these values are always byte offsets, even in UTF-8 mode. They are not
character counts.
So we need to convert byte offsets in UTF-8 back to UTF-16 codepoint offsets. I
can't think of any non-linear algorithm: we need to scan forward and count
bytes and QChars.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Qt5-feedback mailing list [email protected] http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback
