Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-27 Thread Arnaud Clère
> -Original Message-
> From: Jason H  
> Sent: vendredi 25 janvier 2019 17:40
> Cc: development@qt-project.org
> Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString
>
> > By all means, let's make sure the internals are efficient for the more 
> > common languages and scripts; but it's way past time to start doing 
> > Unicode properly, so that all cultures are well-served by default, 
> > when the software folk are using is built on Qt,
>
> I don't think anyone knows what "properly" is. 

+1

> But the more I think about it, the more I like the idea I expressed as a list 
> of sequences of various character sizes. 
> I think it is a good balance between space and efficiency.

It looks like proposed boost::text::unencoded_rope to me, except they chose to 
implement it as a tree of string. 
https://github.com/boostcon/cppnow_presentations_2018/blob/master/05-07-2018_monday/boost_text_fixing_std_string_and_adding_unicode_to_standard_cpp__zach_laine__cppnow_2018__05072018.pdf
 
It makes more sense to me if you consider that efficiently editing large 
strings is not so common.

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 13:39:49 PST Konstantin Tokarev wrote:
> > All living languages are supposed to be stored in the BMP, which means no
> > UTF-16 surrogate pairs to encode them.
> 
> AFAIK all emojis are encoded with surrogate pairs

Emojis are not part of a living language. They're drawings. But yes, they're 
outside the BMP.

In any case, they're often represented by more than one codepoint anyway, so 
whether we used N*2 UTF-16 code units to represent them or N UTF-32 code 
units, it makes no difference. Your code needs to know how to deal with them, 
where to properly break, how to combine them, how to calculate the width, etc.

Also note how they'd be represented by N*4 bytes in UTF-8, which means all 
three representations take exactly the same amount of memory.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Konstantin Tokarev


25.01.2019, 23:33, "Thiago Macieira" :
> On Friday, 25 January 2019 04:54:22 PST Edward Welbourne wrote:
>>  we
>>  fail to properly support cultures whose scripts are relegated to the
>>  outer planes of Unicode - as, for example, the Chakma language's number
>>  system
>
> All living languages are supposed to be stored in the BMP, which means no
> UTF-16 surrogate pairs to encode them.

AFAIK all emojis are encoded with surrogate pairs

>
> That doesn't mean a single code unit, mind you. Think of combining characters.
>
> --
> Thiago Macieira - thiago.macieira (AT) intel.com
>   Software Architect - Intel Open Source Technology Center
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 04:54:22 PST Edward Welbourne wrote:
> we
> fail to properly support cultures whose scripts are relegated to the
> outer planes of Unicode - as, for example, the Chakma language's number
> system

All living languages are supposed to be stored in the BMP, which means no 
UTF-16 surrogate pairs to encode them.

That doesn't mean a single code unit, mind you. Think of combining characters.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 08:54:38 PST Konstantin Tokarev wrote:
> > How often do you need that, oustide of QString itself? And maybe a few
> > efficient QtCore classes? (QCborValue comes to mind)
> 
> Each time I need to interact efficiently with extenal code which isn't
> Qt-based, e.g. WebKit, ICU. In particular, this extra copy would certainly
> degrade performance of QtWebKit.
> 
> Oh and you've mentioned CBOR, this implies that it won't be possible for Qt
> users to make efficient implementation of a different serialization format.

I didn't say we shouldn't have it. I was just trying to gather information 
about the need.

So it looks like we do need it, if we ever change the encoding. My worry is 
that people will fail to handle the combinations properly. Which is why I 
dislike different encodings even more than changing it wholesale with an API-
breaking change.

However, one of my pending Qt 6 changes is to store a flag in QString that 
says "this UTF-16 string is known to contain only US-ASCII characters". That 
way, toUtf8() can use the faster toLatin1() algorithm (the flag is set by 
toUtf8() and toLatin1() the first time they're called). The problem is that it 
needs to clear that flag in all detach() calls.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Giuseppe D'Angelo via Development

Il 25/01/19 10:49, Dominik Haumann ha scritto:

Sidenote: Such a QStringIterator would also be helpful for
KTextEditor, where we likely have some bugs we usually never see since
we never have > UTF16 or composed characters.


I've managed to merge it in QtCore some 5 years ago, comes with docs and 
tests:


https://codereview.qt-project.org/#/c/77136/


You can use it today:

CONFIG += core-private
#include 


It's still missing a couple of bits and bolts to turn it public -- most 
notably, ranged for / STL loop support support. I'd also like to 
investigate more how it overlaps with SG16 / Boost.Text / etc. efforts 
before publishing the current API.


My 2 c,
--
Giuseppe D'Angelo | giuseppe.dang...@kdab.com | Senior Software Engineer
KDAB (France) S.A.S., a KDAB Group company
Tel. France +33 (0)4 90 84 08 53, http://www.kdab.com
KDAB - The Qt, C++ and OpenGL Experts



smime.p7s
Description: Firma crittografica S/MIME
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Konstantin Tokarev


25.01.2019, 01:02, "Thiago Macieira" :
> On Thursday, 24 January 2019 05:06:58 PST Konstantin Tokarev wrote:
>>  I will be officially pissed off if possibility to access raw data of QString
>>  without extra copy is gone It would be better if there is a way to figure
>>  out internal storage encoding (e.g. isUtf16()) and access raw data
>
> How often do you need that, oustide of QString itself? And maybe a few
> efficient QtCore classes? (QCborValue comes to mind)

Each time I need to interact efficiently with extenal code which isn't Qt-based,
e.g. WebKit, ICU. In particular, this extra copy would certainly degrade
performance of QtWebKit.

Oh and you've mentioned CBOR, this implies that it won't be possible for Qt
users to make efficient implementation of a different serialization format.

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Jason H
> By all means, let's make sure the internals are efficient for the more
> common languages and scripts; but it's way past time to start doing
> Unicode properly, so that all cultures are well-served by default, when
> the software folk are using is built on Qt,

I don't think anyone knows what "properly" is. But the more I think about it, 
the more I like the idea I expressed as a list of sequences of various 
character sizes. I think it is a good balance between space and efficiency. To 
recap that:
A class that stores a list of list of same-width characters. For the most naive 
case the list is 1 list long and contains only 8bit characters. This performs 
identically to QByteArray. Non-ASCII languages requiring 16-bit storage are as 
QStrings are now. Then, in the more complicated scenarios, it breaks out 8-bit 
segments and 16-bit segments and makes them appear contiguous. (Emoji in ASCII 
text). Of course there could be functions to collapse it all to the uniform 
largest used width (maximize()) or break it apart to minimize() space (for very 
long 8-bit strings with occasional characters), and there can even be a 
bestFit() heuristic. And as always you can get it serialized as UTF-8 or 16... 
All the above also extends to 32-bit as well. I think this blends handles the 
average case very well (all characters of same width) and has reasonable cost 
for occasional exotic characters. 
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Edward Welbourne
Arnaud Clère (25 January 2019 10:59) wrote:
> Most user code I have written or seen handles text data naively and is
> incorrect in some respect but I think only a minority of if is leading
> to real problems because input data will rarely trigger them.

That depends a lot on who's supplying your data.  The same rationale was
given for "making do" with old 8-bit encodings, which meant programs
worked for various rich nations' primary languages and didn't for anyone
else's.  Then we switched to UTF-16, which let us continue not thinking
about what we're really doing, while reaching a larger slice of the
world.  Still, that leaves us complicit in suppressing various minority
cultures by making software that works for the dominant culture around
them, but not for them.

Until we get into the habit of thinking of text properly (and I still
don't even know the terminology, so I have a way to go on this, just
like anyone) instead of as a sequence of evenly-sized units, we're going
to continue either being inefficient (because we use units that are
bigger than needed for many use-cases - arguably true of UTF-16) or we
fail to properly support cultures whose scripts are relegated to the
outer planes of Unicode - as, for example, the Chakma language's number
system, which QLocale currently can't represent (QTBUG-69324) because
the digits don't fit in a single UTF-16 unit (as QLocaleData expects of
digits, signs and quotes, though it understands most of its other
locale-specific texts might be longer).  As a result, we can't support
any Chakma locale.

By all means, let's make sure the internals are efficient for the more
common languages and scripts; but it's way past time to start doing
Unicode properly, so that all cultures are well-served by default, when
the software folk are using is built on Qt,

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Arnaud Clère
> Original Message-
> From: Thiago Macieira  
>
> But we WILL NOT change from UTF-16 in the next 2 years. 

From a user standpoint, this seems perfectly Ok to me. 
I do not buy the argument that if switching QString to utf8 make developer bugs 
appear sooner, this is a good thing.

Most user code I have written or seen handles text data naively and is 
incorrect in some respect but I think only a minority of if is leading to real 
problems because input data will rarely trigger them.
Although not perfect, using 16 bits "characters" for QString and Windows API is 
 good approximation that helped a lot make user code more robust without 
requiring understanding charsets and encodings.
At least, it saved me a lot of time if I remember correctly the kind of bugs I 
was dealing with in the 90's.

So, IMHO, accessing QString content in utf8 "character" units should remain an 
explicit choice, not the default one.
Even choosing utf8 internally QString for performance reasons seems dubious to 
me, at least for a good half of the world...
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Dominik Haumann
On Thu, Jan 24, 2019 at 10:57 PM Thiago Macieira
 wrote:
>
> On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote:
> >   - Introduce some iterator that iterates over unicode code points.
>
> I wrote that about a decade ago. It's called QStringIterator and it's inside
> our sources, but in a private header.
>
> But we may want to make it iterate over grapheme clusters instead of Unicode
> codepoints. That is, make it use QTextBoundaryFinder to iterate, instead of
> decode the storage to UTF-32.
> [...]

Sidenote: Such a QStringIterator would also be helpful for
KTextEditor, where we likely have some bugs we usually never see since
we never have > UTF16 or composed characters.

Greetings
Dominik
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Thiago Macieira
On Thursday, 24 January 2019 05:06:58 PST Konstantin Tokarev wrote:
> I will be officially pissed off if possibility to access raw data of QString
> without extra copy is gone  It would be better if there is a way to figure
> out internal storage encoding (e.g. isUtf16()) and access raw data

How often do you need that, oustide of QString itself? And maybe a few 
efficient QtCore classes? (QCborValue comes to mind)

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Thiago Macieira
On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote:
>   - Introduce some iterator that iterates over unicode code points.

I wrote that about a decade ago. It's called QStringIterator and it's inside 
our sources, but in a private header.

But we may want to make it iterate over grapheme clusters instead of Unicode 
codepoints. That is, make it use QTextBoundaryFinder to iterate, instead of 
decode the storage to UTF-32.

>   - Deprecate utf16()  and other API that assume that QString is UTF-16
>   - Replace them by a toUtf16 which returns a QVector.  I believe
> that it is possible to make the cotent implicitly shared with the QString,
> avoiding copies. (since it is just a QTypedArrayData internally)

QVector.

Sharing QVector and QString is possible, but we need to fix a few 
discrepancies, especially that of QVector not being allowed to be raw data, 
while QString can be (QVector::fromRawData was proposed for Qt 5.0 [Andreas 
Hartmetz, if I'm not mistaken] but we never added it). So this is fixable for 
Qt 6, but not before Qt 6.

I think I tried even in my branch and ran into a lot of trouble. It was a non-
obvious change. So I abandoned it.

Still, we're not going to switch away from UTF-16 in Qt 6. The best we can do 
is pave the way for switching in Qt 7, if we add the methods you're talking 
about, change ALL the Windows, Cocoa and Android code that calls .data() and 
assumes it to be UTF-16 to toUtf16(). We may want to have some #defines like 
the QStringView stirng level or the ASCII-cast ones, so we catch those.

But we WILL NOT change from UTF-16 in the next 2 years. 

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Konstantin Tokarev


24.01.2019, 10:34, "Olivier Goffart" :
> On 23.01.19 23:15, André Pönitz wrote:
>>  On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:
>>>  23.01.2019, 16:55, "Edward Welbourne" :
  All of this discussion ignores a major elephant: QString's indexing is
  by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
  for a couple of decades now.

  We *should* have a string type (I don't care what you call it) that acts
  on strings indexed by Unicode characters, not in terms of a
  representation. Whether that string type internally uses UTF-16 or
  UTF-8 should be invisible to its user. Ideally it would be capable of
  carrying its data internally in either form (so as to avoid needless
  conversion when both producer and consumer use the same form) and of
  converting between the two (e.g. so as to append efficiently) as needed.
>>>
>>>  I think this is excessive. Most common operations with strings in 
>>> application
>>>  code are:
>>>
>>>  * Pass the string around or compare as an opaque token
>>>  * Draw the string on screen e.g. with QPainter (while technically it
>>> falls in the previous category, I think it's important enough to
>>> deserve separate item)
>>>  * Find substring or pattern (regex) inside the string
>>>  * Split the string by character, pattern, or index boundaries found by 
>>> means
>>> of previous item
>>>
>>>  I think the only common cases when dealing with Unicode grapheme clusters
>>>  is required are
>>>
>>>  * Handling of text cursor movement
>>>  * Implementation of text shaping, i.e. what Harfbuzz is doing
>>>
>>>  I think having special iterator would be quite enough for cursor case. Such
>>>  iterator could abstract away underlying encoding, instead of forcing 
>>> everyone
>>>  to convert to UTF-16 first.
>>
>>  All of that is scarily close to my opinion on the topic.
>
> Same here. I think Konstantin is spot on.
>
> Another example of good string design, I think, is the Rust's String. Their
> string is encoded in valid UTF-8, indexed by bytes, and splitting the string 
> in
> the middle of a code point is a programmer error.
>
> As already mentioned before, UTF-16 is quite a bad choice, if it weren't for
> legacy.
>
> The argument of that developper wrongly using indexes cause more problem with
> utf-8 than with utf-16 ("it would happen for a lot more characters") actually
> means that the developper will see and fix their bugs quickly.
>
> I understand changing QString to UTF-8 is a difficult task if we want to do it
> in a compatible way. However, I think there is a way:
> In Qt5.x:
>   - Introduce some iterator that iterates over unicode code points.
>   - Deprecate utf16() and other API that assume that QString is UTF-16
>   - Replace them by a toUtf16 which returns a QVector. I believe that
> it is possible to make the cotent implicitly shared with the QString, avoiding
> copies. (since it is just a QTypedArrayData internally)

I will be officially pissed off if possibility to access raw data of QString 
without
extra copy is gone :( It would be better if there is a way to figure out 
internal
storage encoding (e.g. isUtf16()) and access raw data

>
> Then in Qt6 one can simply change the representation without breaking
> compatibility with non-deprecated functions.
>
> --
> Olivier
>
> Woboq - Qt services and support - https://woboq.com - https://code.woboq.org
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Konstantin Ritt
>  - Introduce some iterator that iterates over unicode code points.

QStringIterator


> We *should* have a string type (I don't care what you call it) that acts
> on strings indexed by Unicode characters, not in terms of a
> representation.  Whether that string type internally uses UTF-16 or
> UTF-8 should be invisible to its user.  Ideally it would be capable of
> carrying its data internally in either form (so as to avoid needless
> conversion when both producer and consumer use the same form) and of
> converting between the two (e.g. so as to append efficiently) as needed.

That's what I'd support with both hands.
However, I don't think we could do that on QString without breaking most of
the existing code.

P.S. \note Unicode operates on "code points" not "characters". And
moreover, there is no such thing like "glyph" in Unicode string.
And looking for grapheme or glyph boundary is clearly not a string
storage's or a string view's responsibility.

Regards,
Konstantin


чт, 24 янв. 2019 г. в 10:33, Olivier Goffart :

> On 23.01.19 23:15, André Pönitz wrote:
> > On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:
> >> 23.01.2019, 16:55, "Edward Welbourne" :
> >>> All of this discussion ignores a major elephant: QString's indexing is
> >>> by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
> >>> for a couple of decades now.
> >>>
> >>> We *should* have a string type (I don't care what you call it) that
> acts
> >>> on strings indexed by Unicode characters, not in terms of a
> >>> representation. Whether that string type internally uses UTF-16 or
> >>> UTF-8 should be invisible to its user. Ideally it would be capable of
> >>> carrying its data internally in either form (so as to avoid needless
> >>> conversion when both producer and consumer use the same form) and of
> >>> converting between the two (e.g. so as to append efficiently) as
> needed.
> >>
> >> I think this is excessive. Most common operations with strings in
> application
> >> code are:
> >>
> >> * Pass the string around or compare as an opaque token
> >> * Draw the string on screen e.g. with QPainter (while technically it
> >>falls in the previous category, I think it's important enough to
> >>deserve separate item)
> >> * Find substring or pattern (regex) inside the string
> >> * Split the string by character, pattern, or index boundaries found by
> means
> >>of previous item
> >>
> >> I think the only common cases when dealing with Unicode grapheme
> clusters
> >> is required are
> >>
> >> * Handling of text cursor movement
> >> * Implementation of text shaping, i.e. what Harfbuzz is doing
> >>
> >> I think having special iterator would be quite enough for cursor case.
> Such
> >> iterator could abstract away underlying encoding, instead of forcing
> everyone
> >> to convert to UTF-16 first.
> >
> > All of that is scarily close to my opinion on the topic.
>
> Same here. I think Konstantin is spot on.
>
> Another example of good string design, I think, is the Rust's String.
> Their
> string is encoded in valid UTF-8, indexed by bytes, and splitting the
> string in
> the middle of a code point is a programmer error.
>
> As already mentioned before, UTF-16 is quite a bad choice, if it weren't
> for
> legacy.
>
> The argument of that developper wrongly using indexes cause more problem
> with
> utf-8 than with utf-16 ("it would happen for a lot more characters")
> actually
> means that the developper will see and fix their bugs quickly.
>
> I understand changing QString to UTF-8 is a difficult task if we want to
> do it
> in a compatible way. However, I think there is a way:
> In Qt5.x:
>   - Introduce some iterator that iterates over unicode code points.
>   - Deprecate utf16()  and other API that assume that QString is UTF-16
>   - Replace them by a toUtf16 which returns a QVector.  I believe
> that
> it is possible to make the cotent implicitly shared with the QString,
> avoiding
> copies. (since it is just a QTypedArrayData internally)
>
> Then in Qt6 one can simply change the representation without breaking
> compatibility with non-deprecated functions.
>
> --
> Olivier
>
> Woboq - Qt services and support - https://woboq.com -
> https://code.woboq.org
>
>
>
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development
>
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Olivier Goffart

On 23.01.19 23:15, André Pönitz wrote:

On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:

23.01.2019, 16:55, "Edward Welbourne" :

All of this discussion ignores a major elephant: QString's indexing is
by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
for a couple of decades now.

We *should* have a string type (I don't care what you call it) that acts
on strings indexed by Unicode characters, not in terms of a
representation. Whether that string type internally uses UTF-16 or
UTF-8 should be invisible to its user. Ideally it would be capable of
carrying its data internally in either form (so as to avoid needless
conversion when both producer and consumer use the same form) and of
converting between the two (e.g. so as to append efficiently) as needed.


I think this is excessive. Most common operations with strings in application
code are:

* Pass the string around or compare as an opaque token
* Draw the string on screen e.g. with QPainter (while technically it
   falls in the previous category, I think it's important enough to
   deserve separate item)
* Find substring or pattern (regex) inside the string
* Split the string by character, pattern, or index boundaries found by means
   of previous item

I think the only common cases when dealing with Unicode grapheme clusters
is required are

* Handling of text cursor movement
* Implementation of text shaping, i.e. what Harfbuzz is doing

I think having special iterator would be quite enough for cursor case. Such
iterator could abstract away underlying encoding, instead of forcing everyone
to convert to UTF-16 first.


All of that is scarily close to my opinion on the topic.


Same here. I think Konstantin is spot on.

Another example of good string design, I think, is the Rust's String. Their 
string is encoded in valid UTF-8, indexed by bytes, and splitting the string in 
the middle of a code point is a programmer error.


As already mentioned before, UTF-16 is quite a bad choice, if it weren't for 
legacy.


The argument of that developper wrongly using indexes cause more problem with 
utf-8 than with utf-16 ("it would happen for a lot more characters") actually 
means that the developper will see and fix their bugs quickly.


I understand changing QString to UTF-8 is a difficult task if we want to do it 
in a compatible way. However, I think there is a way:

In Qt5.x:
 - Introduce some iterator that iterates over unicode code points.
 - Deprecate utf16()  and other API that assume that QString is UTF-16
 - Replace them by a toUtf16 which returns a QVector.  I believe that 
it is possible to make the cotent implicitly shared with the QString, avoiding 
copies. (since it is just a QTypedArrayData internally)


Then in Qt6 one can simply change the representation without breaking 
compatibility with non-deprecated functions.


--
Olivier

Woboq - Qt services and support - https://woboq.com - https://code.woboq.org




___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread André Pönitz
On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote:
> 23.01.2019, 16:55, "Edward Welbourne" :
> > All of this discussion ignores a major elephant: QString's indexing is
> > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
> > for a couple of decades now.
> >
> > We *should* have a string type (I don't care what you call it) that acts
> > on strings indexed by Unicode characters, not in terms of a
> > representation. Whether that string type internally uses UTF-16 or
> > UTF-8 should be invisible to its user. Ideally it would be capable of
> > carrying its data internally in either form (so as to avoid needless
> > conversion when both producer and consumer use the same form) and of
> > converting between the two (e.g. so as to append efficiently) as needed.
> 
> I think this is excessive. Most common operations with strings in application
> code are:
>
> * Pass the string around or compare as an opaque token
> * Draw the string on screen e.g. with QPainter (while technically it
>   falls in the previous category, I think it's important enough to
>   deserve separate item)
> * Find substring or pattern (regex) inside the string
> * Split the string by character, pattern, or index boundaries found by means
>   of previous item
> 
> I think the only common cases when dealing with Unicode grapheme clusters
> is required are
>
> * Handling of text cursor movement
> * Implementation of text shaping, i.e. what Harfbuzz is doing
> 
> I think having special iterator would be quite enough for cursor case. Such
> iterator could abstract away underlying encoding, instead of forcing everyone
> to convert to UTF-16 first.

All of that is scarily close to my opinion on the topic.

Andre'
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 06:07:37 PST Marco Bubke wrote:
> Would it be not better to use a simple container and then functions on top
> which use a view, so we could use them with any container

If only we had a class that found boundaries in text...

http://doc.qt.io/qt-5/qtextboundaryfinder.html

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Edward Welbourne
Marco Bubke (23 January 2019 15:07) wrote
> Would it be not better to use a simple container and then functions on
> top which use a view, so we could use them with any container.

That sounds just fine to me.

Indeed, in separating the "Unicode text" nature from its encoding, I'm
fine with the *storage* being the encoding and the text being a view of
that storage - just as long as we get an API that lets us deal with
every form of storage (and encoding) consistently in terms of Unicode,
when the code accessing it wants to do that.

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 07:25:44 PST Jason H wrote:
> > From: "Arnaud Clère" 
> > 
> > > And I don't want to add QUtf8String until SG16's char8_t gets settled.
> > > It'll probably be settled by C++20, which means we can probably work on
> > > this during Qt 6 lifetime, possibly even 6.1 or 6.2.> 
> > It makes sense to avoid future incompatibilities with the standard but
> > fortunately Qt sometimes chooses to solve real problems ahead in time 
> > ;-)
> Well C++20 is really how many months away? Qt6 won't be released until when?

Give me the exact answers and I'll tell you if we can have this in Qt 6.0.

The fact you can't is the problem: they're too much in flux and too close to 
each other for us to be able to accept char8_t as an established functionality 
that won't change by a later paper and design a solution for Qt 6.0. If we're 
lucky, we can do it. More likely, we'll have to wait a bit, possibly even for 
a compiler to implement it.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 05:53:00 PST Edward Welbourne wrote:
> What are our chances of getting this right in Qt 6 ?

Not bad. But what you described is what SG16 is working on for std::text. So 
let's not do something different from them. We can prototype it and be first, 
though.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Jason H
> From: "Arnaud Clère" 
> > And I don't want to add QUtf8String until SG16's char8_t gets settled. 
> > It'll probably be settled by C++20, which means we can probably work on 
> > this during Qt 6 lifetime, possibly even 6.1 or 6.2.
> 
> It makes sense to avoid future incompatibilities with the standard but 
> fortunately Qt sometimes chooses to solve real problems ahead in time  ;-)

Well C++20 is really how many months away? Qt6 won't be released until when? It 
seems like both of these might land at the same time, except that the "by 
C++20" is (AFAICT) speculation. Uptake will also be slow. But by Qt being first 
we can get experience with the nature of the solution which might help inform 
the standard, or vice-versa. There's a risk we do something that conflicts with 
the standard in a useful way that people like, then we have fragmentation. 

Far smarter people than I have worked on this, so again burn this with fire, 
but my current thinking is: 
I think the problem is how all these things are implemented - they are 
basically escape codes, so it's impossible to say where thee current character 
ends and the next begins. This of course kills speed, but that's what we get 
for having more than one language on the planet plus emojis. It seems to me 
that the only real solution to keep it all fast is to progressively upgrade 
from bytes to the widest character and use that. This will have a scanning cost 
when it enters the address space if not denoted to the compiler or by the load 
method.  If memory is a concern, the only alternative I see is to create a 
complex string: "strings" are now arrays of character arrays of uniform width, 
and hope that it is only ever one:
"Ground control to Major Tom" - single sequence of 8 bit chars, len 27 size 27
"niños." encoded as 3 "strings", total length 6, size 7:
+ "ni" - "ni" (8 bit char sequence of 2 char)
+ "ñ" -  0001 (UTF16 16 bit char sequence of 1 char)
+ "os." - "o" (8 bit char sequence of 3 char)

In the old days BASIC, I forget which one, but I'm remembering a Dr Dobbs or 
some other print medium (over 20 years ago), I read BASIC stores strings as a 
linked list of characters, I'm adapting that idea. There are many tradeoffs, 
but until we're ok with 32 bit characters, there will be tradeoffs on a 
multi-language planet. 

I just don't think escape codes should ever be stored in memory. Disk is fine. 

"Better to remain silent and be thought a fool than to speak and to remove all 
doubt." - (Disputed). I think I may have broken that rule here. "Please, be 
gentle." - Peter Venkman

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Konstantin Tokarev


23.01.2019, 16:55, "Edward Welbourne" :
> All of this discussion ignores a major elephant: QString's indexing is
> by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode
> for a couple of decades now.
>
> We *should* have a string type (I don't care what you call it) that acts
> on strings indexed by Unicode characters, not in terms of a
> representation. Whether that string type internally uses UTF-16 or
> UTF-8 should be invisible to its user. Ideally it would be capable of
> carrying its data internally in either form (so as to avoid needless
> conversion when both producer and consumer use the same form) and of
> converting between the two (e.g. so as to append efficiently) as needed.

I think this is excessive. Most common operations with strings in application
code are:
* Pass the string around or compare as an opaque token
* Draw the string on screen e.g. with QPainter (while technically it falls in 
the
previous category, I think it's important enough to deserve separate item)
* Find substring or pattern (regex) inside the string
* Split the string by character, pattern, or index boundaries found by means
of previous item

I think the only common cases when dealing with Unicode grapheme clusters
is required are
* Handling of text cursor movement
* Implementation of text shaping, i.e. what Harfbuzz is doing

I think having special iterator would be quite enough for cursor case. Such
iterator could abstract away underlying encoding, instead of forcing everyone
to convert to UTF-16 first.

>
> Meanwhile, buffers of data (whether 8-bit, 16-bit or of other sizes) are
> types we do need in diverse places - but they should be described
> differently from the sting type (call it a "text" type, if hysterical
> reasons oblige us to use "string" for its encoding). They can be
> interpreted as strings, hence can serve as backing-store for a string,
> provided they respect the relevant rules of a relevant encoding.
>
> If blob[index] always returns a Unicode *character*, then blob is a
> string; if it can sometimes return one half of a UTF-16 surrogate pair
> (as is the case with QString today) or one byte of a multi-byte UTF-8
> chunk, then blob is not really a string, it's just the storage for an
> encoding of a string.
>
> What are our chances of getting this right in Qt 6 ?
> It's the 21st century - way past time we did this,
>
> Eddy.
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Marco Bubke
I am not sure it would be a good idea because a glyph can be still composed of 
more than one code points which is language dependent. Some time you want 
characters, sometimes code points and sometimes glyphs etc.. Would it be not 
better to use a simple container and then functions on top which use a view, so 
we could use them with any container. So we would avoid any allocations for 
transforming characters from one to the other container. But anyway I think 
there are many usages for strings that one class to tackle all this problems is 
not enough.


From: Development  on behalf of Edward 
Welbourne 
Sent: Wednesday, January 23, 2019 2:53:00 PM
To: Arnaud Clère; Thiago Macieira
Cc: development@qt-project.org
Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString

All of this discussion ignores a major elephant: QString's indexing is
by 16-bit UTF-16 tokens, not by Unicode characters.  We've had Unicode
for a couple of decades now.

We *should* have a string type (I don't care what you call it) that acts
on strings indexed by Unicode characters, not in terms of a
representation.  Whether that string type internally uses UTF-16 or
UTF-8 should be invisible to its user.  Ideally it would be capable of
carrying its data internally in either form (so as to avoid needless
conversion when both producer and consumer use the same form) and of
converting between the two (e.g. so as to append efficiently) as needed.

Meanwhile, buffers of data (whether 8-bit, 16-bit or of other sizes) are
types we do need in diverse places - but they should be described
differently from the sting type (call it a "text" type, if hysterical
reasons oblige us to use "string" for its encoding).  They can be
interpreted as strings, hence can serve as backing-store for a string,
provided they respect the relevant rules of a relevant encoding.

If blob[index] always returns a Unicode *character*, then blob is a
string; if it can sometimes return one half of a UTF-16 surrogate pair
(as is the case with QString today) or one byte of a multi-byte UTF-8
chunk, then blob is not really a string, it's just the storage for an
encoding of a string.

What are our chances of getting this right in Qt 6 ?
It's the 21st century - way past time we did this,

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Edward Welbourne
All of this discussion ignores a major elephant: QString's indexing is
by 16-bit UTF-16 tokens, not by Unicode characters.  We've had Unicode
for a couple of decades now.

We *should* have a string type (I don't care what you call it) that acts
on strings indexed by Unicode characters, not in terms of a
representation.  Whether that string type internally uses UTF-16 or
UTF-8 should be invisible to its user.  Ideally it would be capable of
carrying its data internally in either form (so as to avoid needless
conversion when both producer and consumer use the same form) and of
converting between the two (e.g. so as to append efficiently) as needed.

Meanwhile, buffers of data (whether 8-bit, 16-bit or of other sizes) are
types we do need in diverse places - but they should be described
differently from the sting type (call it a "text" type, if hysterical
reasons oblige us to use "string" for its encoding).  They can be
interpreted as strings, hence can serve as backing-store for a string,
provided they respect the relevant rules of a relevant encoding.

If blob[index] always returns a Unicode *character*, then blob is a
string; if it can sometimes return one half of a UTF-16 surrogate pair
(as is the case with QString today) or one byte of a multi-byte UTF-8
chunk, then blob is not really a string, it's just the storage for an
encoding of a string.

What are our chances of getting this right in Qt 6 ?
It's the 21st century - way past time we did this,

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Arnaud Clère
> -Original Message-
> From: Thiago Macieira  
>
> On Tuesday, 22 January 2019 09:01:16 PST Arnaud Clère wrote:
> > QByteArray is the official way to deal with utf8 strings but:
> > 1. This discussion shows it is not as known as it should be and I 
> > argue the name does not help 2. Dealing with binary data and all kind 
> > of string encodings in a single class is error-prone
>
> And yet that's what we used to have in Qt 3 (remember QCString?). We went 
> away from it for a reason.

Sorry no, I never used Qt3. I just googled it looking for problems and only 
found ones that should be solved now by QByteArray:
- explicit sharing
- bad performance due to append() being O(length()) since it scans for a null 
terminator

> And 3: some character-mutating operations in QByteArray (toUpper, etc.) are 
> Latin1, not UTF-8.

A QUtf8String could override toUpper() and toLower() which are unfortunate if 
QByteArray really is the official way to deal with utf-8 strings...

> > Hence my suggestion of adding a QUtf8String deriving from QByteArray...
> Not likely to happen. If we add a QUtf8String, it will be like QLatin1String, 
> which in turn was meant to be similar to QStringView, not like QString. That 
> means no mutation and no owning memory.

The use case I am talking about is really a mutable utf8 container, even though 
it could provide a QUtf8StringLiteral macro similar to QByteArrayLiteral. I do 
not understand why a QUtf8String should necessarily be like a QLatinString.

OTOH, I would love to be able to manipulate QLatin1String/QUtf8String with a 
QStringView when dealing with possibly non-ASCII content. But QStringView seems 
to require knowing the number of remaining Unicode characters in constant time 
so I guess it is out of question...

> And I don't want to add QUtf8String until SG16's char8_t gets settled. It'll 
> probably be settled by C++20, which means we can probably work on this during 
> Qt 6 lifetime, possibly even 6.1 or 6.2.

It makes sense to avoid future incompatibilities with the standard but 
fortunately Qt sometimes chooses to solve real problems ahead in time  ;-)



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 09:01:16 PST Arnaud Clère wrote:
> QByteArray is the official way to deal with utf8 strings but:
> 1. This discussion shows it is not as known as it should be and I argue the
> name does not help 
> 2. Dealing with binary data and all kind of string
> encodings in a single class is error-prone

And yet that's what we used to have in Qt 3 (remember QCString?). We went away 
from it for a reason.

And 3: some character-mutating operations in QByteArray (toUpper, etc.) are 
Latin1, not UTF-8.

> Hence my suggestion of adding a QUtf8String deriving from QByteArray...

Not likely to happen. If we add a QUtf8String, it will be like QLatin1String, 
which in turn was meant to be similar to QStringView, not like QString. That 
means no mutation and no owning memory.

And I don't want to add QUtf8String until SG16's char8_t gets settled. It'll 
probably be settled by C++20, which means we can probably work on this during 
Qt 6 lifetime, possibly even 6.1 or 6.2.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 11:02:22 PST Matthew Woehlke wrote:
> On 18/01/2019 11.09, Thiago Macieira wrote:
> > As for strings, the QString constructor takes UTF-8 input, but however
> > fast
> > the decoder is, it's still slightly slower than the Latin1 decoder. So if
> > your string is purely US-ASCII, using QLatin1String is recommended.
> 
> ...but I assume QStringLiteral remains even faster? (I would think so;
> not only is *no* decoding needed, which you could also get just by using
> wide string literals, but also no *allocation*...)

Yes. In terms of CPU cycles, for a given string length of US-ASCII content:

QUtf8::convertToUnicode > qt_from_latin1 > memcpy > ∅
(fromUtf8, fromLatin1, fromUtf16, QStringLiteral)

The empty set symbol indicates that QStringLiteral is requires no operation on 
the content (O(1) on length). The others are O(n).

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Matthew Woehlke
On 18/01/2019 11.09, Thiago Macieira wrote:
> As for strings, the QString constructor takes UTF-8 input, but however fast 
> the decoder is, it's still slightly slower than the Latin1 decoder. So if 
> your 
> string is purely US-ASCII, using QLatin1String is recommended.

...but I assume QStringLiteral remains even faster? (I would think so;
not only is *no* decoding needed, which you could also get just by using
wide string literals, but also no *allocation*...)

-- 
Matthew
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Arnaud Clère
> Original Message-
> From: Jason H  
>
> > From: "Arnaud Clère" 
> >
> > > -Original Message-
> > > From: Allan Sandfeld Jensen 
> > >
> > > Use QByteArray when you can.
> > 
> > I think a QUtf8String class derived from QByteArray would help a lot making 
> > this happen in the real world!
>
> Feel free to burn this suggestion with fire, but what about:
>
> typedef QSymbolSequence QLatin1String;
> typedef QSymbolSequence QByteArray; 
> typedef QSymbolSequence QByteArray; 
> typedef QSymbolSequence QString;
>
> So they can have the same API? It really seems to me that the issue is 
> storage, not that they need a different API to operate on the storage. 

This is close to QStringView and it would be nice to be able to build one from 
QByteArray/QUtf8String to access utf8 characters as QChar on the fly. It would 
avoid most MBCS problems with utf8 strings. Unfortunately, I am afraid this is 
not possible for QStringView since it must know the number of remaining 
characters and utf8 requires to decode the whole string to know that.

My point was not as ambitious:
QByteArray is the official way to deal with utf8 strings but:
1. This discussion shows it is not as known as it should be and I argue the 
name does not help
2. Dealing with binary data and all kind of string encodings in a single class 
is error-prone

Hence my suggestion of adding a QUtf8String deriving from QByteArray...

I have no idea if it would be feasible though

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 06:49:51 PST Jason H wrote:
> typedef QSymbolSequence QLatin1String;
> typedef QSymbolSequence QByteArray;
> typedef QSymbolSequence QByteArray;
> typedef QSymbolSequence QString;
> 
> So they can have the same API? It really seems to me that the issue is
> storage, not that they need a different API to operate on the storage.

That QSymbolSequence template class does not exist and is not easy to 
implement. Storage is not the problem, it's actually the algorithms that 
operate on and transform the contents. They'd have to be rewritten for each of 
the four.

Go ahead and give it a try, though. This may also be what SG16 intends for 
C++23, so it may be an interesting trial run.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Jason H


> Sent: Monday, January 21, 2019 at 9:51 AM
> From: "Arnaud Clère" 
> To: "Allan Sandfeld Jensen" , "development@qt-project.org" 
> 
> Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString
>
> > -Original Message-
> > From: Allan Sandfeld Jensen  
> >
> > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> > > Any chance of having UTF-8 storage support for QString?
> > > 
> > Use QByteArray when you can.
> 
> I think a QUtf8String class derived from QByteArray would help a lot making 
> this happen in the real world!
> 1. It would be found more easily by users in need of a utf8 encoded dynamic 
> string
> 2. It would allow making the encoding explicit (QString or QUtf8String or 
> QLatin1String) in newer Qt APIs or user-defined ones, and even totally safe 
> if disabling const char * casts is possible
> 3. It would allow adding QString-like APIs (like setNum(), simplified(), 
> etc.) over the time without cluttering QByteArray
> 
> Moreover, I have a specific use-case where QByteArray args are used as binary 
> data (say CBOR) and a specific Utf8String is useful to handle utf8 encoded 
> args without always encoding/decoding to utf16.
> I might not be the only one...


Feel free to burn this suggestion with fire, but what about:

typedef QSymbolSequence QLatin1String;
typedef QSymbolSequence QByteArray;
typedef QSymbolSequence QByteArray;
typedef QSymbolSequence QString;

So they can have the same API? It really seems to me that the issue is storage, 
not that they need a different API to operate on the storage. 


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-21 Thread Arnaud Clère
> -Original Message-
> From: Allan Sandfeld Jensen  
>
> On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> > Any chance of having UTF-8 storage support for QString?
> > 
> Use QByteArray when you can.

I think a QUtf8String class derived from QByteArray would help a lot making 
this happen in the real world!
1. It would be found more easily by users in need of a utf8 encoded dynamic 
string
2. It would allow making the encoding explicit (QString or QUtf8String or 
QLatin1String) in newer Qt APIs or user-defined ones, and even totally safe if 
disabling const char * casts is possible
3. It would allow adding QString-like APIs (like setNum(), simplified(), etc.) 
over the time without cluttering QByteArray

Moreover, I have a specific use-case where QByteArray args are used as binary 
data (say CBOR) and a specific Utf8String is useful to handle utf8 encoded args 
without always encoding/decoding to utf16.
I might not be the only one...
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 08:57:19 PST Tor Arne Vestbø wrote:
> > On 18 Jan 2019, at 17:21, Thiago Macieira 
> > Actually, what we should do is allow everywhere
> > 
> > functionTakingString(u"Tor Arne Vestbø")
> > // (note the u)
> 
> Yes, this would be awesome! Please let’s do this 
> 
> And I guess without QT_NO_CAST_FROM_ASCII you’d still be able to do:
> 
>   functionTakingString("Tor Arne Vestbø”) // without the ‘u’, runtime cost

Right, but given the benefit of char16_t literals, we should encourage the 
QT_NO_CAST_FROM_ASCII even more! It's a single extra letter in your source and 
even if the compiler is misconfigured and is producing mojibake for your 
surname, my middle name or Jędrzej's first name, it will still work for US-
ASCII content ("a broken clock is right twice a day" type of "work").

In fact, we ought to look into replacing our QLatin1String content with 
char16_t literals in our sources. Pros: avoid the Latin1 decoder, which is 
slower[¹] than a pure memcpy. Cons: doubles the size of the string. So I'd use 
QLatin1String only for uncommonly used strings, where saving a few bytes is 
worth it.


[¹] see https://analysis.godbolt.org/z/OZ-5Gz, which contains the inner loop 
of qt_from_latin1_internal (an AVX2 build[²]) and compare to an equivalent 
memcpy in https://analysis.godbolt.org/z/7vR2jW. Note how the memcpy loop 
according to llvm-mca has 3 cycles fewer of latency than the latin1 decoder. 
And this is not an optimal memcpy loop.

[²] Our builds are not AVX2 by default. You're only going to get this 
performance if you build with -march=native (Gentoo?) or you use Clear Linux. 
The defaults are much worse.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 08:13:40 PST Kai Koehne wrote:
> 1. We generally compile Qt code with QT_NO_CAST_FROM_ASCII that disables the
> QString(const char *) overload. And we do that so that you have to make it
> explicit whether you really want to do the implicit conversion from UTF-8
> to UTF-16, use QStringLiteral() to encode it as UTF-16 at compile time, or
> rather have it translated with a  tr() call.
> 
> I think for Qt code explicit is better than implicit, so I actually would
> stay with QT_NO_CAST_FROM_ASCII.

Actually, what we should do is allow everywhere

functionTakingString(u"Tor Arne Vestbø")
// (note the u)

Which causes the compiler to encode the string in UTF-16, bypassing the need 
for runtime decoding, and enforcing sources as UTF-8, so we get consistent 
binaries. It's just one step short of QStringLiteral in that it will still 
allocate memory, but it only needs a memcpy. Such code also works with 
functions taking QStringView without memory allocation.

We all know that QStringLiteral has drawbacks when it comes to unloading 
modules. For QtCore, obviously QStringLiteral is not a problem, but other 
modules may decide to avoid it.

PS: I still want to improve QStringLiteral, but it will still be different 
from a pure char16_t literal.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Kai Koehne
> -Original Message-
> From: Development  On Behalf Of Tor
> Arne Vestbø
> Sent: Friday, January 18, 2019 4:27 PM
> To: Jedrzej Nowacki 
> Cc: Thiago Macieira ; development@qt-
> project.org
> Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString
> 
> Picking up on this:
> 
> If we plan to standardise on our Qt source code being UTF8, can we please
> allow QString(“Tor Arne Vestbø") without going through
> QLatin1Literal/QStringLiteral/QLatin1String/etc etc?

I think you're touching two different things here:

1. We generally compile Qt code with QT_NO_CAST_FROM_ASCII that disables the 
QString(const char *) overload. And we do that so that you have to make it 
explicit whether you really want to do the implicit conversion from UTF-8 to 
UTF-16, use QStringLiteral() to encode it as UTF-16 at compile time, or rather 
have it translated with a  tr() call.

I think for Qt code explicit is better than implicit, so I actually would stay 
with QT_NO_CAST_FROM_ASCII.

2. We require all Qt source code to be ASCII only. This is AFAIK mostly because 
of the editor in Visual Studio, who's even in its latest incarnation doesn't 
have a global option to save files in UTF-8 instead of 
.

Here I'm not sure anymore whether being conservative buys us much. VS after all 
has a heuristic to open a UTF-8 encoded file correctly, so the problem mostly 
is that people might create a new file with non-UTF-8 content, or copy it from 
another project.

Regards

Kai
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 07:26:51 PST Tor Arne Vestbø wrote:
> If we plan to standardise on our Qt source code being UTF8, can we please
> allow QString(“Tor Arne Vestbø") without going through
> QLatin1Literal/QStringLiteral/QLatin1String/etc etc?

I think we now can. The last problem we had was MSVC pre-2015 update 2, which 
added the /utf-8 switch. Without that option, any non-ASCII character in the 
source code, even in comments, could cause compilation errors by causing a 
decoding error in whichever codepage the user used in his/her OS.

I think all our builds now use /utf-8, which means UTF-8 is permitted 
everywhere now. You can use it in comments ("Copyright Klarälvdalens ...", for 
example) and in strings. Please don't use it in identifiers.

As for strings, the QString constructor takes UTF-8 input, but however fast 
the decoder is, it's still slightly slower than the Latin1 decoder. So if your 
string is purely US-ASCII, using QLatin1String is recommended.

PS: we don't need SG16's char8_t, but we'll need to add support for it.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Tor Arne Vestbø
Picking up on this:

If we plan to standardise on our Qt source code being UTF8, can we please allow 
QString(“Tor Arne Vestbø") without going through 
QLatin1Literal/QStringLiteral/QLatin1String/etc etc?

Tor Arne 

> On 18 Jan 2019, at 16:01, Jedrzej Nowacki  wrote:
> 
> Dnia środa, 16 stycznia 2019 21:12:55 CET André Pönitz pisze:
>> On Tue, Jan 15, 2019 at 10:44:45PM +0100, Allan Sandfeld Jensen wrote:
>>> On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
 Hi,
 
 With every Qt release we see how the new release improved over previous
 releases in terms of speed, memory consumption, etc.
 
 Any chance of having UTF-8 storage support for QString?
>>> 
>>> Use QByteArray when you can.
>> 
>> Unfortunately, quite a few APIs require to use QString, even if
>> the typically use case would be completely fine even with ASCII,
>> like keys in QVariantMap or QSettings.
>> 
>> Andre'
> 
> As a travelling person with name that can not be represented with latin1, I 
> can tell you some funny stories about systems that authors thought that 
> "ascii 
> is enough". Unless you want to keep only hex codes or sha1s, please use 
> bigger 
> character set.
> 
> Cheers,
>  Jędrek
> 
> 
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Jedrzej Nowacki
Dnia środa, 16 stycznia 2019 21:12:55 CET André Pönitz pisze:
> On Tue, Jan 15, 2019 at 10:44:45PM +0100, Allan Sandfeld Jensen wrote:
> > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> > > Hi,
> > > 
> > > With every Qt release we see how the new release improved over previous
> > > releases in terms of speed, memory consumption, etc.
> > > 
> > > Any chance of having UTF-8 storage support for QString?
> > 
> > Use QByteArray when you can.
> 
> Unfortunately, quite a few APIs require to use QString, even if
> the typically use case would be completely fine even with ASCII,
> like keys in QVariantMap or QSettings.
> 
> Andre'

As a travelling person with name that can not be represented with latin1, I 
can tell you some funny stories about systems that authors thought that "ascii 
is enough". Unless you want to keep only hex codes or sha1s, please use bigger 
character set.

Cheers,
  Jędrek


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-17 Thread Thiago Macieira
On Thursday, 17 January 2019 13:27:40 PST Martin Koller wrote:
> On Mittwoch, 16. Jänner 2019 19:44:27 CET Konstantin Tokarev wrote:
> > From QtWebKit perpective it would be great if Qt APIs which require
> > QString now would also accept QLatin1String at least for ASCII-only data
> is QtWebKit still alive ?
> Seems there is nobody working on it since more than a year...

Konstantin is the maintainer, but I haven't seen releases recently, so it's 
not something I could recommend depending on.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-17 Thread Martin Koller
On Mittwoch, 16. Jänner 2019 19:44:27 CET Konstantin Tokarev wrote:

> From QtWebKit perpective it would be great if Qt APIs which require QString 
> now would also accept QLatin1String at least for ASCII-only data

is QtWebKit still alive ?
Seems there is nobody working on it since more than a year...

-- 
Best regards/Schöne Grüße

Martin
A: Because it breaks the logical sequence of discussion
Q: Why is top posting bad?

()  ascii ribbon campaign - against html e-mail 
/\- against proprietary attachments

Geschenkideen, Accessoires, Seifen, Kulinarisches: www.lillehus.at


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Thiago Macieira
On Wednesday, 16 January 2019 13:16:39 PST Konstantin Tokarev wrote:
> 1. Code points may be encoded as surrogate pairs in UTF-16, e.g. this is the
> case for Emoji characters. QString ignores this fact, indexing 16-bit
> QChars. To make things worse, several QString methods like left(), right(),
> and mid() will happily cut surrogate pair in a half.

So does QByteArray or so would an UTF-8 based QString, except it would happen 
for a lot more characters.

What you want is QTextBoundaryFinder and possible QFontMetrics.

> 2. When people are talking about character indexing they often imply
> indexing of grapheme clusters. In Unicode world grapheme cluster may be
> represented as a several code points depending on normalization form of the
> source. To make things worse, even in NFC form not every grapheme cluster
> that is possible in Unicode is representable as a single code point.

Indeed, and SG16 in the C++ Standard is looking into grapheme clusters as the 
basis unit. Unfortunately, their work does not coincide with our Qt 6 
timelines, nor would we be able to adapt that quickly based on how much code 
there is using QString.

We should pay attention to the SG16 work and make sure it works with Qt 6, 
with eyes towards a better API in Qt 7.

Nowhere did I say that we should use UTF-8.
-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev


15.01.2019, 23:13, "Alexander Akulich" :
> Cristian,
>
> the previous discussion is "Why can't QString use UTF-8 internally?"
> There is something wrong with our maillist, the best link I found is
> [1]. For some reason link to the thread head [2] is broken.
>
> [1] 
> https://lists.qt-project.org/pipermail/development/2015-February/040199.html

Note that if anyone wants to use easier character indexing as an argument for 
using UTF-16 instead of UTF-8,
that's not the case. 

1. Code points may be encoded as surrogate pairs in UTF-16, e.g. this is the 
case for Emoji characters. QString
ignores this fact, indexing 16-bit QChars. To make things worse, several 
QString methods like left(), right(), and mid()
will happily cut surrogate pair in a half. 

2. When people are talking about character indexing they often imply indexing 
of grapheme clusters. In Unicode world
grapheme cluster may be represented as a several code points depending on 
normalization form of the source.
To make things worse, even in NFC form not every grapheme cluster that is 
possible in Unicode is representable as a
single code point.

> [2] 
> https://lists.qt-project.org/pipermail/development/2015-February/020155.html


>
> On Tue, Jan 15, 2019 at 9:48 PM Cristian Adam  wrote:
>>  Hi,
>>
>>  With every Qt release we see how the new release improved over previous 
>> releases in terms of speed, memory consumption, etc.
>>
>>  Any chance of having UTF-8 storage support for QString?
>>
>>  UTF-8 is native on Linux and other *NIX platforms, Qt programs should use 
>> less memory, and perform better by reading less bytes from memory.
>>
>>  Did anybody try this?
>>
>>  I've heard that Qt Creator is storing sources files both in UTF-8 format 
>> for libclang, and UTF16 for its internal usage. That sounds like a bit 
>> wasteful.
>>
>>  KDE Plasma could then better compare / compete with the other Linux desktop 
>> environments which use UTF-8 for strings.
>>
>>  I guess I could use CopperSpice to test this, since they added CsString 
>> with both QString8 (UTF-8) and QString16 (UTF-16) supported.
>>
>>  https://utf8everywhere.org/ states "UTF-16 is the worst of both worlds, 
>> being both variable length and too wide"
>>
>>  Cheers,
>>  Cristian.
>>  ___
>>  Development mailing list
>>  Development@qt-project.org
>>  https://lists.qt-project.org/listinfo/development
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread André Pönitz
On Tue, Jan 15, 2019 at 10:44:45PM +0100, Allan Sandfeld Jensen wrote:
> On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> > Hi,
> > 
> > With every Qt release we see how the new release improved over previous
> > releases in terms of speed, memory consumption, etc.
> > 
> > Any chance of having UTF-8 storage support for QString?
> > 
> Use QByteArray when you can.

Unfortunately, quite a few APIs require to use QString, even if
the typically use case would be completely fine even with ASCII,
like keys in QVariantMap or QSettings.

Andre'
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Jason H
> Sent: Tuesday, January 15, 2019 at 4:44 PM
> From: "Allan Sandfeld Jensen" 
> To: development@qt-project.org
> Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString
>
> On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> > Hi,
> > 
> > With every Qt release we see how the new release improved over previous
> > releases in terms of speed, memory consumption, etc.
> > 
> > Any chance of having UTF-8 storage support for QString?
> > 
> Use QByteArray when you can.

And *I* do. (Not the OP). But I would love a QByteArray that matches QString's 
API. I wrote this email, then was going about gathering evidence to make my 
case about why QByteArray was inadequate. It seems many of my complaints with 
using QByteArray over the years have been addressed unbeknownst to me, though 
one glaring omission remains:

- QByteArray lacks QString's arg() support. The QString("%1").arg(X) 
combination is pretty readable, and reliable, and maintainable. 

I don't know how the average Qt user stacks up, but I only use QStrings in UIs 
(because I have to), the rest (which is a lot) is all QByteArray. When I'm 
using QString not in a UI, it's normally ending with toUtf8().

I don't really care utf8 vs 16, vs whatever (slight bias for utf8 as it looks 
better in hex editors), I just hate that I have to deal with character width 
issue this concretely by having to code against one of two very similar but not 
equivalent classes. 
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Thiago Macieira
On Wednesday, 16 January 2019 10:44:27 PST Konstantin Tokarev wrote:
> From QtWebKit perpective it would be great if Qt APIs which require QString
> now would also accept QLatin1String at least for ASCII-only data

Which ones? Currently, the only thing that takes QLatin1String in the API is 
QString itself. Where would you like to see more QLatin1String API?

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev


15.01.2019, 21:45, "Cristian Adam" :
> Hi,
>
> With every Qt release we see how the new release improved over previous 
> releases in terms of speed, memory consumption, etc.
>
> Any chance of having UTF-8 storage support for QString?
>
> UTF-8 is native on Linux and other *NIX platforms, Qt programs should use 
> less memory, and perform better by reading less bytes from memory.
>
> Did anybody try this?
>
> I've heard that Qt Creator is storing sources files both in UTF-8 format for 
> libclang, and UTF16 for its internal usage. That sounds like a bit wasteful.
>
> KDE Plasma could then better compare / compete with the other Linux desktop 
> environments which use UTF-8 for strings.
>
> I guess I could use CopperSpice to test this, since they added CsString with 
> both QString8 (UTF-8) and QString16 (UTF-16) supported.
>
> https://utf8everywhere.org/ states "UTF-16 is the worst of both worlds, being 
> both variable length and too wide"

From QtWebKit perpective it would be great if Qt APIs which require QString now 
would also accept QLatin1String at least for ASCII-only data

>
> Cheers,
> Cristian.
> ,
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development


-- 
Regards,
Konstantin
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev


16.01.2019, 00:46, "Allan Sandfeld Jensen" :
> On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
>>  Hi,
>>
>>  With every Qt release we see how the new release improved over previous
>>  releases in terms of speed, memory consumption, etc.
>>
>>  Any chance of having UTF-8 storage support for QString?
>
> Use QByteArray when you can.

Problem is that with many Qt APIs one must use QString

>
> Regards
> 'Allan
>
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development

-- 
Regards,
Konstantin

___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Edward Welbourne
Marco Bubke (16 January 2019 10:59) reported:
>> https://utf8everywhere.org/ states "UTF-16 is the worst of both
>> worlds, being both variable length and too wide"

Konstantin Ritt (16 January 2019 17:50) replied
> https://utf8everywhere.org/ states bullshit. try reading an alternative 
> sources.

At the very least, one might guess from the site name that it's possible
the site does not "speak in a neutral voice" upon the subject; it has a
clear bias in favour of UTF-8.

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Ritt
> https://utf8everywhere.org/ states *"UTF-16 is the worst of both worlds,
being both variable length and too wide"*

https://utf8everywhere.org/ *states bullshit. try reading an alternative
sources.*


Regards,
Konstantin


ср, 16 янв. 2019 г. в 13:20, Edward Welbourne :

> Marco Bubke (16 January 2019 10:59)
> > You can use std::string which as small string optimization instead of
> > QByteArray too. In many cases where you would use const String 
> > you can use std::string_view, so you are more flexible.
>
> Note that we now have a QStringView, which can likewise replace many
> uses of const QString & - not that this is any help with UTF-8.
> Uptake has been slow, but some of us are using it.
>
> Eddy.
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development
>
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Edward Welbourne
Marco Bubke (16 January 2019 10:59)
> You can use std::string which as small string optimization instead of
> QByteArray too. In many cases where you would use const String 
> you can use std::string_view, so you are more flexible.

Note that we now have a QStringView, which can likewise replace many
uses of const QString & - not that this is any help with UTF-8.
Uptake has been slow, but some of us are using it.

Eddy.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Marco Bubke
You can use std::string which as small string optimization instead of 
QByteArray too. In many cases where you would use const String  you can use 
std::string_view, so you are more flexible.


From: Development  on behalf of Allan 
Sandfeld Jensen 
Sent: Tuesday, January 15, 2019 10:44:45 PM
To: development@qt-project.org
Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString

On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> Hi,
>
> With every Qt release we see how the new release improved over previous
> releases in terms of speed, memory consumption, etc.
>
> Any chance of having UTF-8 storage support for QString?
>
Use QByteArray when you can.

Regards
'Allan


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Allan Sandfeld Jensen
On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote:
> Hi,
> 
> With every Qt release we see how the new release improved over previous
> releases in terms of speed, memory consumption, etc.
> 
> Any chance of having UTF-8 storage support for QString?
> 
Use QByteArray when you can.

Regards
'Allan


___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Alexander Akulich
Cristian,

the previous discussion is "Why can't QString use UTF-8 internally?"
There is something wrong with our maillist, the best link I found is
[1]. For some reason link to the thread head [2] is broken.

[1] https://lists.qt-project.org/pipermail/development/2015-February/040199.html
[2] https://lists.qt-project.org/pipermail/development/2015-February/020155.html

On Tue, Jan 15, 2019 at 9:48 PM Cristian Adam  wrote:
>
> Hi,
>
> With every Qt release we see how the new release improved over previous 
> releases in terms of speed, memory consumption, etc.
>
> Any chance of having UTF-8 storage support for QString?
>
> UTF-8 is native on Linux and other *NIX platforms, Qt programs should use 
> less memory, and perform better by reading less bytes from memory.
>
> Did anybody try this?
>
> I've heard that Qt Creator is storing sources files both in UTF-8 format for 
> libclang, and UTF16 for its internal usage. That sounds like a bit wasteful.
>
> KDE Plasma could then better compare / compete with the other Linux desktop 
> environments which use UTF-8 for strings.
>
> I guess I could use CopperSpice to test this, since they added CsString with 
> both QString8 (UTF-8) and QString16 (UTF-16) supported.
>
> https://utf8everywhere.org/ states "UTF-16 is the worst of both worlds, being 
> both variable length and too wide"
>
> Cheers,
> Cristian.
> ___
> Development mailing list
> Development@qt-project.org
> https://lists.qt-project.org/listinfo/development
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Thiago Macieira
On Tuesday, 15 January 2019 10:43:57 PST Cristian Adam wrote:
> Any chance of having UTF-8 storage support for QString?

No.

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center



___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development


[Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Cristian Adam
Hi,

With every Qt release we see how the new release improved over previous
releases in terms of speed, memory consumption, etc.

Any chance of having UTF-8 storage support for QString?

UTF-8 is native on Linux and other *NIX platforms, Qt programs should use
less memory, and perform better by reading less bytes from memory.

Did anybody try this?

I've heard that Qt Creator is storing sources files both in UTF-8 format
for libclang, and UTF16 for its internal usage. That sounds like a bit
wasteful.

KDE Plasma could then better compare / compete with the other Linux desktop
environments which use UTF-8 for strings.

I guess I could use CopperSpice to test this, since they added CsString
with both QString8 (UTF-8) and QString16 (UTF-16) supported.

https://utf8everywhere.org/ states *"UTF-16 is the worst of both worlds,
being both variable length and too wide"*

Cheers,
Cristian.
___
Development mailing list
Development@qt-project.org
https://lists.qt-project.org/listinfo/development