Re: [Lazarus] dynamic string proposal

2017-08-18 Thread Juha Manninen via Lazarus
I answer here Tony's post in "String vs WideString" thread. On Thu, Aug 17, 2017 at 2:09 PM, Tony Whyman via Lazarus wrote: > Are you making my points for me? If such a basic term as "character" means 7 > different things then something is badly amiss. It should

Re: [Lazarus] dynamic string proposal

2017-08-17 Thread Sven Barth via Lazarus
Am 17.08.2017 11:11 schrieb "Michael Schnell via Lazarus" < lazarus@lists.lazarus-ide.org>: > > Maybe, Sven could answer to this mail in the other thread... > I provided an example in my answer to Tony Whyman in the same subbranch of the thread. Regards, Sven --

Re: [Lazarus] dynamic string proposal

2017-08-17 Thread Michael Schnell via Lazarus
Maybe, Sven could answer to this mail in the other thread... On 14.08.2017 18:47, Sven Barth via Lazarus wrote: The main problem of such a dynamic type would be the inability to do fast indexing as the compiler would need to insert runtime checks for the size of a character. What

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 20:44, Juha Manninen via Lazarus wrote: So using "char" (the type) as reference to "codepoint" is something we have to do, because today the type "char" is for codepoints. Sorry I didn't understand this one. "Char" (the type) holds a codeunit, not a codepoint. Char is either 1

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 7:53 PM, Martin Frb via Lazarus wrote: >> I know CodeUnit and CodePoint are not called "character" officially by >> the Unicode Standard. >> They however are called "character" in normal communication. > > And that is where the problem

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 16:55, Juha Manninen via Lazarus wrote: On Wed, Aug 16, 2017 at 6:24 PM, Martin Frb via Lazarus wrote: Actually no. I know CodeUnit and CodePoint are not called "character" officially by the Unicode Standard. They however are called "character" in

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 18:06:36 +0200 Michael Schnell via Lazarus wrote: >[...] > The only difference to the current status is that with the "dynamic" > string brand the content of the "bytes per element" field is not > predefined by the variable declaration but can

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 17:55, Juha Manninen via Lazarus wrote: although Pos(), Copy() and Length() deal with CodeUnit resolution. I wonder how the new fancy string types would handle it without a performance penalty. This again is not in the scope of the paper, and supposed to stay as it is. S[x],

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 17:20, Juha Manninen via Lazarus wrote: Unicode is the standard now. We cannot ignore it, and we don't want to ignore it because it solves so many problems of the earlier solutions. If you create a new string type, you certainly must take Unicode into account. It is not "ignored",

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 16:20, Juha Manninen via Lazarus wrote: The word "character" in Unicode can mean: 1. CodeUnit — Represented by Pascal type "Char". Actually no. It can overlap. But a codeunit is NOT a character. For example a codeunit that holds a codepoint of class "combining mark", this is

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 4:49 PM, Michael Schnell via Lazarus wrote: >> You are writing about encodings etc. which are part of codepoints, but >> you call them "characters". Why? > > Because the type for this stuff used in Delphi and and FPC is called "char". No,

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 13:48, Michael Schnell wrote: On 16.08.2017 14:30, Martin Frb via Lazarus wrote: And that would still not be "char", but "codepoint" A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 13:37, Alexey via Lazarus wrote: On 16.08.2017 15:30, Martin Frb via Lazarus wrote: A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). See my prev

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 15:33, Juha Manninen via Lazarus wrote: Why don't you implement such a system. This is all FOSS, free and open source. I would never dare to try to edit the compiler :-[ You are writing about encodings etc. which are part of codepoints, but you call them "characters". Why?

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 2:47 PM, Michael Schnell via Lazarus wrote: > -Michael (It's rather frustrating to discuss that obviously never will > happen :-() Why don't you implement such a system. This is all FOSS, free and open source. You are writing about

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Juha Manninen via Lazarus
On Wed, Aug 16, 2017 at 3:37 PM, Alexey via Lazarus wrote: > See my prev post: i see that each S[i] good to be like QWord (sizeof(one > char)= sizeof(Qword)). It can be TextChar. And type can be TextString. > internally it can be compressed to utf8. TextString is

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote: For some unknown reason you want to store different encodings in a TStrings and fear the "time-consuming" and loss-prone auto conversions. It's obvious that a user using a different encoding brand in a string var than that suggested by

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote: Not if complicated things get more complicated. Please leave out the additional encoding brands suggested just as an afterthought in the paper. These are not the purpose at all but ()if the other stuff would be in place) just com as a

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 14:22, Alexey via Lazarus wrote: BTW, it will be good to have "Cstring" (or another name, not "dynamicstring") : ... You are missing the point the paper is supposed to be about: enhancing the versatility of the library functions such as those using TStrings. Not just creating

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 15:22:20 +0300 Alexey via Lazarus wrote: > On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote: > > When you propose a new string type "dynamicstring" you have to define these > > operators. > > BTW, it will be good to have "Cstring" (or

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 14:30, Martin Frb via Lazarus wrote: And that would still not be "char", but "codepoint" A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form).

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 13:47:26 +0200 Michael Schnell via Lazarus wrote: > On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote: > > You are confusing people if you name your encodings like this. > There also is no "official" Code pages named "Default" or

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Alexey via Lazarus
On 16.08.2017 15:30, Martin Frb via Lazarus wrote: A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form). See my prev post: i see that each S[i] good to be like QWord

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Martin Frb via Lazarus
On 16/08/2017 10:51, Mattias Gaertner via Lazarus wrote: Of course an appropriate "char" type for each string encoding brand could to be provided, hence a "CP_QWord Char" as an alias or a QWord. There is no QWord codepage. That would be confusing. And that would still not be "char", but

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Alexey via Lazarus
On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote: When you propose a new string type "dynamicstring" you have to define these operators. BTW, it will be good to have "Cstring" (or another name, not "dynamicstring") : - [] operator is 0-based like Python/C - s[i] is DWORD per char

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote: You are confusing people if you name your encodings like this. There also is no "official" Code pages named "Default" or "None", the naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero. So I did the same and just

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Mattias Gaertner via Lazarus
On Wed, 16 Aug 2017 12:24:55 +0200 Michael Schnell via Lazarus wrote: > On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote: > > Every Delphi/FPC type has a bunch of operators. Strings support :=, =, > > <>, >=, <= and [] for read and write. > > When you

Re: [Lazarus] dynamic string proposal

2017-08-16 Thread Michael Schnell via Lazarus
On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote: Every Delphi/FPC type has a bunch of operators. Strings support :=, =, <>, >=, <= and [] for read and write. When you propose a new string type "dynamicstring" you have to define these operators. That is easily doable. The definition of