Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-27 Thread Arnaud Clère
> -Original Message- > From: Jason H > Sent: vendredi 25 janvier 2019 17:40 > Cc: development@qt-project.org > Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString > > > By all means, let's make sure the internals are efficient for the mo

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 13:39:49 PST Konstantin Tokarev wrote: > > All living languages are supposed to be stored in the BMP, which means no > > UTF-16 surrogate pairs to encode them. > > AFAIK all emojis are encoded with surrogate pairs Emojis are not part of a living language. They're

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Konstantin Tokarev
25.01.2019, 23:33, "Thiago Macieira" : > On Friday, 25 January 2019 04:54:22 PST Edward Welbourne wrote: >>  we >>  fail to properly support cultures whose scripts are relegated to the >>  outer planes of Unicode - as, for example, the Chakma language's number >>  system > > All living languages

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 04:54:22 PST Edward Welbourne wrote: > we > fail to properly support cultures whose scripts are relegated to the > outer planes of Unicode - as, for example, the Chakma language's number > system All living languages are supposed to be stored in the BMP, which means no

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Thiago Macieira
On Friday, 25 January 2019 08:54:38 PST Konstantin Tokarev wrote: > > How often do you need that, oustide of QString itself? And maybe a few > > efficient QtCore classes? (QCborValue comes to mind) > > Each time I need to interact efficiently with extenal code which isn't > Qt-based, e.g. WebKit,

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Giuseppe D'Angelo via Development
Il 25/01/19 10:49, Dominik Haumann ha scritto: Sidenote: Such a QStringIterator would also be helpful for KTextEditor, where we likely have some bugs we usually never see since we never have > UTF16 or composed characters. I've managed to merge it in QtCore some 5 years ago, comes with docs

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Konstantin Tokarev
25.01.2019, 01:02, "Thiago Macieira" : > On Thursday, 24 January 2019 05:06:58 PST Konstantin Tokarev wrote: >>  I will be officially pissed off if possibility to access raw data of QString >>  without extra copy is gone It would be better if there is a way to figure >>  out internal storage

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Jason H
> By all means, let's make sure the internals are efficient for the more > common languages and scripts; but it's way past time to start doing > Unicode properly, so that all cultures are well-served by default, when > the software folk are using is built on Qt, I don't think anyone knows what

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Edward Welbourne
Arnaud Clère (25 January 2019 10:59) wrote: > Most user code I have written or seen handles text data naively and is > incorrect in some respect but I think only a minority of if is leading > to real problems because input data will rarely trigger them. That depends a lot on who's supplying your

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Arnaud Clère
> Original Message- > From: Thiago Macieira > > But we WILL NOT change from UTF-16 in the next 2 years. From a user standpoint, this seems perfectly Ok to me. I do not buy the argument that if switching QString to utf8 make developer bugs appear sooner, this is a good thing. Most

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-25 Thread Dominik Haumann
On Thu, Jan 24, 2019 at 10:57 PM Thiago Macieira wrote: > > On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote: > > - Introduce some iterator that iterates over unicode code points. > > I wrote that about a decade ago. It's called QStringIterator and it's inside > our sources, but

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Thiago Macieira
On Thursday, 24 January 2019 05:06:58 PST Konstantin Tokarev wrote: > I will be officially pissed off if possibility to access raw data of QString > without extra copy is gone It would be better if there is a way to figure > out internal storage encoding (e.g. isUtf16()) and access raw data How

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Thiago Macieira
On Wednesday, 23 January 2019 23:32:28 PST Olivier Goffart wrote: > - Introduce some iterator that iterates over unicode code points. I wrote that about a decade ago. It's called QStringIterator and it's inside our sources, but in a private header. But we may want to make it iterate over

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Konstantin Tokarev
24.01.2019, 10:34, "Olivier Goffart" : > On 23.01.19 23:15, André Pönitz wrote: >>  On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote: >>>  23.01.2019, 16:55, "Edward Welbourne" :  All of this discussion ignores a major elephant: QString's indexing is  by 16-bit UTF-16

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-24 Thread Konstantin Ritt
> - Introduce some iterator that iterates over unicode code points. QStringIterator > We *should* have a string type (I don't care what you call it) that acts > on strings indexed by Unicode characters, not in terms of a > representation. Whether that string type internally uses UTF-16 or >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Olivier Goffart
On 23.01.19 23:15, André Pönitz wrote: On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote: 23.01.2019, 16:55, "Edward Welbourne" : All of this discussion ignores a major elephant: QString's indexing is by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode for a

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread André Pönitz
On Wed, Jan 23, 2019 at 05:40:33PM +0300, Konstantin Tokarev wrote: > 23.01.2019, 16:55, "Edward Welbourne" : > > All of this discussion ignores a major elephant: QString's indexing is > > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode > > for a couple of decades now. > > >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 06:07:37 PST Marco Bubke wrote: > Would it be not better to use a simple container and then functions on top > which use a view, so we could use them with any container If only we had a class that found boundaries in text...

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Edward Welbourne
Marco Bubke (23 January 2019 15:07) wrote > Would it be not better to use a simple container and then functions on > top which use a view, so we could use them with any container. That sounds just fine to me. Indeed, in separating the "Unicode text" nature from its encoding, I'm fine with the

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 07:25:44 PST Jason H wrote: > > From: "Arnaud Clère" > > > > > And I don't want to add QUtf8String until SG16's char8_t gets settled. > > > It'll probably be settled by C++20, which means we can probably work on > > > this during Qt 6 lifetime, possibly even 6.1 or

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Thiago Macieira
On Wednesday, 23 January 2019 05:53:00 PST Edward Welbourne wrote: > What are our chances of getting this right in Qt 6 ? Not bad. But what you described is what SG16 is working on for std::text. So let's not do something different from them. We can prototype it and be first, though. --

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Jason H
> From: "Arnaud Clère" > > And I don't want to add QUtf8String until SG16's char8_t gets settled. > > It'll probably be settled by C++20, which means we can probably work on > > this during Qt 6 lifetime, possibly even 6.1 or 6.2. > > It makes sense to avoid future incompatibilities with the

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Konstantin Tokarev
23.01.2019, 16:55, "Edward Welbourne" : > All of this discussion ignores a major elephant: QString's indexing is > by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode > for a couple of decades now. > > We *should* have a string type (I don't care what you call it) that acts >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Marco Bubke
. From: Development on behalf of Edward Welbourne Sent: Wednesday, January 23, 2019 2:53:00 PM To: Arnaud Clère; Thiago Macieira Cc: development@qt-project.org Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString All of this discussion ignores a major elephant: QString's indexing

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Edward Welbourne
All of this discussion ignores a major elephant: QString's indexing is by 16-bit UTF-16 tokens, not by Unicode characters. We've had Unicode for a couple of decades now. We *should* have a string type (I don't care what you call it) that acts on strings indexed by Unicode characters, not in

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-23 Thread Arnaud Clère
> -Original Message- > From: Thiago Macieira > > On Tuesday, 22 January 2019 09:01:16 PST Arnaud Clère wrote: > > QByteArray is the official way to deal with utf8 strings but: > > 1. This discussion shows it is not as known as it should be and I > > argue the name does not help 2.

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 09:01:16 PST Arnaud Clère wrote: > QByteArray is the official way to deal with utf8 strings but: > 1. This discussion shows it is not as known as it should be and I argue the > name does not help > 2. Dealing with binary data and all kind of string > encodings in a

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 11:02:22 PST Matthew Woehlke wrote: > On 18/01/2019 11.09, Thiago Macieira wrote: > > As for strings, the QString constructor takes UTF-8 input, but however > > fast > > the decoder is, it's still slightly slower than the Latin1 decoder. So if > > your string is purely

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Matthew Woehlke
On 18/01/2019 11.09, Thiago Macieira wrote: > As for strings, the QString constructor takes UTF-8 input, but however fast > the decoder is, it's still slightly slower than the Latin1 decoder. So if > your > string is purely US-ASCII, using QLatin1String is recommended. ...but I assume

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Arnaud Clère
> Original Message- > From: Jason H > > > From: "Arnaud Clère" > > > > > -Original Message- > > > From: Allan Sandfeld Jensen > > > > > > Use QByteArray when you can. > > > > I think a QUtf8String class derived from QByteArray would help a lot making > > this happen in the

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Thiago Macieira
On Tuesday, 22 January 2019 06:49:51 PST Jason H wrote: > typedef QSymbolSequence QLatin1String; > typedef QSymbolSequence QByteArray; > typedef QSymbolSequence QByteArray; > typedef QSymbolSequence QString; > > So they can have the same API? It really seems to me that the issue is > storage, not

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-22 Thread Jason H
> Sent: Monday, January 21, 2019 at 9:51 AM > From: "Arnaud Clère" > To: "Allan Sandfeld Jensen" , "development@qt-project.org" > > Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString > > > -Original Message-

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-21 Thread Arnaud Clère
> -Original Message- > From: Allan Sandfeld Jensen > > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > > Any chance of having UTF-8 storage support for QString? > > > Use QByteArray when you can. I think a QUtf8String class derived from QByteArray would help a lot

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 08:57:19 PST Tor Arne Vestbø wrote: > > On 18 Jan 2019, at 17:21, Thiago Macieira > > Actually, what we should do is allow everywhere > > > > functionTakingString(u"Tor Arne Vestbø") > > // (note the u) > > Yes, this would be awesome! Please let’s do this  >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 08:13:40 PST Kai Koehne wrote: > 1. We generally compile Qt code with QT_NO_CAST_FROM_ASCII that disables the > QString(const char *) overload. And we do that so that you have to make it > explicit whether you really want to do the implicit conversion from UTF-8 > to

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Kai Koehne
> -Original Message- > From: Development On Behalf Of Tor > Arne Vestbø > Sent: Friday, January 18, 2019 4:27 PM > To: Jedrzej Nowacki > Cc: Thiago Macieira ; development@qt- > project.org > Subject: Re: [Development] Qt6: Adding UTF-8 storage support to

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Thiago Macieira
On Friday, 18 January 2019 07:26:51 PST Tor Arne Vestbø wrote: > If we plan to standardise on our Qt source code being UTF8, can we please > allow QString(“Tor Arne Vestbø") without going through > QLatin1Literal/QStringLiteral/QLatin1String/etc etc? I think we now can. The last problem we had

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Tor Arne Vestbø
Picking up on this: If we plan to standardise on our Qt source code being UTF8, can we please allow QString(“Tor Arne Vestbø") without going through QLatin1Literal/QStringLiteral/QLatin1String/etc etc? Tor Arne > On 18 Jan 2019, at 16:01, Jedrzej Nowacki wrote: > > Dnia środa, 16 stycznia

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-18 Thread Jedrzej Nowacki
Dnia środa, 16 stycznia 2019 21:12:55 CET André Pönitz pisze: > On Tue, Jan 15, 2019 at 10:44:45PM +0100, Allan Sandfeld Jensen wrote: > > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > > > Hi, > > > > > > With every Qt release we see how the new release improved over previous >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-17 Thread Thiago Macieira
On Thursday, 17 January 2019 13:27:40 PST Martin Koller wrote: > On Mittwoch, 16. Jänner 2019 19:44:27 CET Konstantin Tokarev wrote: > > From QtWebKit perpective it would be great if Qt APIs which require > > QString now would also accept QLatin1String at least for ASCII-only data > is QtWebKit

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-17 Thread Martin Koller
On Mittwoch, 16. Jänner 2019 19:44:27 CET Konstantin Tokarev wrote: > From QtWebKit perpective it would be great if Qt APIs which require QString > now would also accept QLatin1String at least for ASCII-only data is QtWebKit still alive ? Seems there is nobody working on it since more than a

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Thiago Macieira
On Wednesday, 16 January 2019 13:16:39 PST Konstantin Tokarev wrote: > 1. Code points may be encoded as surrogate pairs in UTF-16, e.g. this is the > case for Emoji characters. QString ignores this fact, indexing 16-bit > QChars. To make things worse, several QString methods like left(), right(),

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev
15.01.2019, 23:13, "Alexander Akulich" : > Cristian, > > the previous discussion is "Why can't QString use UTF-8 internally?" > There is something wrong with our maillist, the best link I found is > [1]. For some reason link to the thread head [2] is broken. > > [1] >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread André Pönitz
On Tue, Jan 15, 2019 at 10:44:45PM +0100, Allan Sandfeld Jensen wrote: > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > > Hi, > > > > With every Qt release we see how the new release improved over previous > > releases in terms of speed, memory consumption, etc. > > > > Any

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Jason H
> Sent: Tuesday, January 15, 2019 at 4:44 PM > From: "Allan Sandfeld Jensen" > To: development@qt-project.org > Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString > > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > > Hi, >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Thiago Macieira
On Wednesday, 16 January 2019 10:44:27 PST Konstantin Tokarev wrote: > From QtWebKit perpective it would be great if Qt APIs which require QString > now would also accept QLatin1String at least for ASCII-only data Which ones? Currently, the only thing that takes QLatin1String in the API is

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev
15.01.2019, 21:45, "Cristian Adam" : > Hi, > > With every Qt release we see how the new release improved over previous > releases in terms of speed, memory consumption, etc. > > Any chance of having UTF-8 storage support for QString? > > UTF-8 is native on Linux and other *NIX platforms, Qt

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Tokarev
16.01.2019, 00:46, "Allan Sandfeld Jensen" : > On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: >>  Hi, >> >>  With every Qt release we see how the new release improved over previous >>  releases in terms of speed, memory consumption, etc. >> >>  Any chance of having UTF-8 storage

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Edward Welbourne
Marco Bubke (16 January 2019 10:59) reported: >> https://utf8everywhere.org/ states "UTF-16 is the worst of both >> worlds, being both variable length and too wide" Konstantin Ritt (16 January 2019 17:50) replied > https://utf8everywhere.org/ states bullshit. try reading an alternative >

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Konstantin Ritt
> https://utf8everywhere.org/ states *"UTF-16 is the worst of both worlds, being both variable length and too wide"* https://utf8everywhere.org/ *states bullshit. try reading an alternative sources.* Regards, Konstantin ср, 16 янв. 2019 г. в 13:20, Edward Welbourne : > Marco Bubke (16

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Edward Welbourne
Marco Bubke (16 January 2019 10:59) > You can use std::string which as small string optimization instead of > QByteArray too. In many cases where you would use const String > you can use std::string_view, so you are more flexible. Note that we now have a QStringView, which can likewise replace

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-16 Thread Marco Bubke
, January 15, 2019 10:44:45 PM To: development@qt-project.org Subject: Re: [Development] Qt6: Adding UTF-8 storage support to QString On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > Hi, > > With every Qt release we see how the new release improved over previous > relea

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Allan Sandfeld Jensen
On Dienstag, 15. Januar 2019 19:43:57 CET Cristian Adam wrote: > Hi, > > With every Qt release we see how the new release improved over previous > releases in terms of speed, memory consumption, etc. > > Any chance of having UTF-8 storage support for QString? > Use QByteArray when you can.

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Alexander Akulich
Cristian, the previous discussion is "Why can't QString use UTF-8 internally?" There is something wrong with our maillist, the best link I found is [1]. For some reason link to the thread head [2] is broken. [1] https://lists.qt-project.org/pipermail/development/2015-February/040199.html [2]

Re: [Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Thiago Macieira
On Tuesday, 15 January 2019 10:43:57 PST Cristian Adam wrote: > Any chance of having UTF-8 storage support for QString? No. -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center ___

[Development] Qt6: Adding UTF-8 storage support to QString

2019-01-15 Thread Cristian Adam
Hi, With every Qt release we see how the new release improved over previous releases in terms of speed, memory consumption, etc. Any chance of having UTF-8 storage support for QString? UTF-8 is native on Linux and other *NIX platforms, Qt programs should use less memory, and perform better by