On Tuesday, 11 de October de 2011 16:05:42 Sylvain Pointeau wrote:
> > An UTF16 API is required for the engine though, as all QStrings are UTF16.
> 
> How much does it cost to convert UTF-16 to UTF-8 ?
> Is it really a show-stopper for choosing PCRE?

It costs infinitely more to convert to UTF-8 than to do nothing. The conversion 
takes non-zero time and the non-conversion takes no time at all. Division by 
zero.

We could optimise the UTF-8 encoder -- in fact, I already have in the QUrl 
refactor work because I needed that.

What worries me more is to extract offset information. Suppose the following 
code:

        int pos = regexp.indexIn(str);

pcre_exec will return an offset of the match start and end for the whole  
match, as well as a pair of integers for each capture. Note what the manual 
says (pcreapi(3)):

Note:  these values are always byte offsets, even in UTF-8 mode. They are not 
character counts.

So we need to convert byte offsets in UTF-8 back to UTF-16 codepoint offsets. I 
can't think of any non-linear algorithm: we need to scan forward and count 
bytes and QChars.

-- 
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
   Software Architect - Intel Open Source Technology Center
      PGP/GPG: 0x6EF45358; fingerprint:
      E067 918B B660 DBD1 105C  966C 33F5 F005 6EF4 5358

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Qt5-feedback mailing list
[email protected]
http://lists.qt.nokia.com/mailman/listinfo/qt5-feedback

Reply via email to