I answer here Tony's post in "String vs WideString" thread.
On Thu, Aug 17, 2017 at 2:09 PM, Tony Whyman via Lazarus
wrote:
> Are you making my points for me? If such a basic term as "character" means 7
> different things then something is badly amiss. It should
Am 17.08.2017 11:11 schrieb "Michael Schnell via Lazarus" <
lazarus@lists.lazarus-ide.org>:
>
> Maybe, Sven could answer to this mail in the other thread...
>
I provided an example in my answer to Tony Whyman in the same subbranch of
the thread.
Regards,
Sven
--
Maybe, Sven could answer to this mail in the other thread...
On 14.08.2017 18:47, Sven Barth via Lazarus wrote:
The main problem of such a dynamic type would be the inability to do
fast indexing as the compiler would need to insert runtime checks for
the size of a character.
What
On 16/08/2017 20:44, Juha Manninen via Lazarus wrote:
So using "char" (the type) as reference to "codepoint" is something we have
to do, because today the type "char" is for codepoints.
Sorry I didn't understand this one.
"Char" (the type) holds a codeunit, not a codepoint. Char is either 1
On Wed, Aug 16, 2017 at 7:53 PM, Martin Frb via Lazarus
wrote:
>> I know CodeUnit and CodePoint are not called "character" officially by
>> the Unicode Standard.
>> They however are called "character" in normal communication.
>
> And that is where the problem
On 16/08/2017 16:55, Juha Manninen via Lazarus wrote:
On Wed, Aug 16, 2017 at 6:24 PM, Martin Frb via Lazarus
wrote:
Actually no.
I know CodeUnit and CodePoint are not called "character" officially by
the Unicode Standard.
They however are called "character" in
On Wed, 16 Aug 2017 18:06:36 +0200
Michael Schnell via Lazarus wrote:
>[...]
> The only difference to the current status is that with the "dynamic"
> string brand the content of the "bytes per element" field is not
> predefined by the variable declaration but can
On 16.08.2017 17:55, Juha Manninen via Lazarus wrote:
although Pos(), Copy() and Length() deal with CodeUnit resolution.
I wonder how the new fancy string types would handle it without a
performance penalty.
This again is not in the scope of the paper, and supposed to stay as it
is. S[x],
On 16.08.2017 17:20, Juha Manninen via Lazarus wrote:
Unicode is the standard now. We cannot ignore it, and we don't want to
ignore it because it solves so many problems of the earlier solutions.
If you create a new string type, you certainly must take Unicode into account.
It is not "ignored",
On 16/08/2017 16:20, Juha Manninen via Lazarus wrote:
The word "character" in Unicode can mean:
1. CodeUnit — Represented by Pascal type "Char".
Actually no.
It can overlap. But a codeunit is NOT a character.
For example a codeunit that holds a codepoint of class "combining mark",
this is
On Wed, Aug 16, 2017 at 4:49 PM, Michael Schnell via Lazarus
wrote:
>> You are writing about encodings etc. which are part of codepoints, but
>> you call them "characters". Why?
>
> Because the type for this stuff used in Delphi and and FPC is called "char".
No,
On 16/08/2017 13:48, Michael Schnell wrote:
On 16.08.2017 14:30, Martin Frb via Lazarus wrote:
And that would still not be "char", but "codepoint"
A char can be composed of several combining code points (each of them
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not
On 16/08/2017 13:37, Alexey via Lazarus wrote:
On 16.08.2017 15:30, Martin Frb via Lazarus wrote:
A char can be composed of several combining code points (each of them
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a
combined form).
See my prev
On 16.08.2017 15:33, Juha Manninen via Lazarus wrote:
Why don't you implement such a system. This is all FOSS, free and open
source.
I would never dare to try to edit the compiler :-[
You are writing about encodings etc. which are part of codepoints, but
you call them "characters". Why?
On Wed, Aug 16, 2017 at 2:47 PM, Michael Schnell via Lazarus
wrote:
> -Michael (It's rather frustrating to discuss that obviously never will
> happen :-()
Why don't you implement such a system. This is all FOSS, free and open source.
You are writing about
On Wed, Aug 16, 2017 at 3:37 PM, Alexey via Lazarus
wrote:
> See my prev post: i see that each S[i] good to be like QWord (sizeof(one
> char)= sizeof(Qword)). It can be TextChar. And type can be TextString.
> internally it can be compressed to utf8. TextString is
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote:
For some unknown
reason you want to store different encodings in a TStrings and fear
the "time-consuming" and loss-prone auto conversions.
It's obvious that a user using a different encoding brand in a string
var than that suggested by
On 16.08.2017 14:43, Mattias Gaertner via Lazarus wrote:
Not if complicated things get more complicated.
Please leave out the additional encoding brands suggested just as an
afterthought in the paper. These are not the purpose at all but ()if the
other stuff would be in place) just com as a
On 16.08.2017 14:22, Alexey via Lazarus wrote:
BTW, it will be good to have "Cstring" (or another name, not
"dynamicstring") : ...
You are missing the point the paper is supposed to be about: enhancing
the versatility of the library functions such as those using TStrings.
Not just creating
On Wed, 16 Aug 2017 15:22:20 +0300
Alexey via Lazarus wrote:
> On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote:
> > When you propose a new string type "dynamicstring" you have to define these
> > operators.
>
> BTW, it will be good to have "Cstring" (or
On 16.08.2017 14:30, Martin Frb via Lazarus wrote:
And that would still not be "char", but "codepoint"
A char can be composed of several combining code points (each of them
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a
combined form).
On Wed, 16 Aug 2017 13:47:26 +0200
Michael Schnell via Lazarus wrote:
> On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote:
> > You are confusing people if you name your encodings like this.
> There also is no "official" Code pages named "Default" or
On 16.08.2017 15:30, Martin Frb via Lazarus wrote:
A char can be composed of several combining code points (each of them
afaik, in the 32 bit range).
So a char can have 96 or more bits. (And not all of them have a
combined form).
See my prev post: i see that each S[i] good to be like QWord
On 16/08/2017 10:51, Mattias Gaertner via Lazarus wrote:
Of course an appropriate "char" type for each string encoding brand
could to be provided, hence a "CP_QWord Char" as an alias or a QWord.
There is no QWord codepage. That would be confusing.
And that would still not be "char", but
On 16.08.2017 12:51, Mattias Gaertner via Lazarus wrote:
When you propose a new string type "dynamicstring" you have to define these
operators.
BTW, it will be good to have "Cstring" (or another name, not
"dynamicstring") :
- [] operator is 0-based like Python/C
- s[i] is DWORD per char
On 16.08.2017 13:17, Mattias Gaertner via Lazarus wrote:
You are confusing people if you name your encodings like this.
There also is no "official" Code pages named "Default" or "None", the
naming "CP_DEFAULT" and "CP_NONE" has just been invented by Emparcadero.
So I did the same and just
On Wed, 16 Aug 2017 12:24:55 +0200
Michael Schnell via Lazarus wrote:
> On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote:
> > Every Delphi/FPC type has a bunch of operators. Strings support :=, =,
> > <>, >=, <= and [] for read and write.
> > When you
On 16.08.2017 11:51, Mattias Gaertner via Lazarus wrote:
Every Delphi/FPC type has a bunch of operators. Strings support :=, =,
<>, >=, <= and [] for read and write.
When you propose a new string type "dynamicstring" you have to define these
operators.
That is easily doable.
The definition of
28 matches
Mail list logo