Re: [Lazarus] String vs WideString

2017-08-15 Thread Bo Berglund via Lazarus
On Tue, 15 Aug 2017 21:22:10 +0200, Luca Olivetti via Lazarus
 wrote:

>(I remarked the "if" because I don't know if that's the case, according 
>to Bo Berglund's experience it is)

Just to expand on my "experience" and the reason I posted:

My work on converting the old program started back a couple of years
when I went from Delphi 2007 (pre-unicode) to Delphi XE5 because we
wanted the GUI to be translatable to non-western languages.

But then all the communications functions (and these are many in this
utility application) broke because they used strings as containers for
the inherently binary serial data.

So I followed advice on the Embarcadero forum to switch to AnsiString
because that was really what the old string type was an alias for.
I had no great insight in the inner workings of the string handling
functions but I "knew" that AnsiString was a 1-byte per element and
(unicode)string was now a 2-byte per element container. The fact that
the code could alter the content of the AnsiString did not dawn on me
at all.
And the comm functions worked fine after the change (I tested a lot,
but of course only on my English Win7 computer).

Then some time ago there was a report of a failure of the new program
version that only happened in Korea, China and Thailand. In the log
files there was a very strange entry about finding an illegal command
byte when sending a command to the equipment.

It never triggered when I debugged the problem, for me and my
collegues it worked flawlessly. So I had to add more logging and found
that the problem arose when the outgoing command was built. A certain
1-byte command was then expanded to 2 bytes with the wrong first byte!
The commands in the protocol are the first byte of the data of a
telegram and they are in range $C0..$E9.
When one of these (I don't now remember exactly which one) was used in
an assignment to the AnsiString buffer it was converted to $3F +
something that was never logged and the operation failed because the
equipment could not decode the command.

So I asked again on the forum and was steered towards RawByteString
because presumably that container would disallow conversions.
And when I changed this and sent a new version to the distributor in
Korea the problem was seemingly gone.

Based on this experience I wanted to alert the OP of the fact that
using AnsiString instead of string is not a cure-all for binary data,
you need to fix the codepage too, which is what the RawByteString does
for you

But I have now moved on and replaced all comm related containers with
TBytes including modifying the serial component we have used.
(With some help from Remy Lebeau).


-- 
Bo Berglund
Developer in Sweden

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus

On 2017-08-15 20:22, Luca Olivetti via Lazarus wrote:

Wait a minute, why "abuse"?
After all, before code aware strings, an ansistring could store any kind
of arbitrary data with no problem and no conversion, and made it
extremely easy



Just listen to what you are saying A string type and you want to 
store all kinds of non-string related data in that type. How is that not 
"abuse"???  Use a TBytes, TStream or other binary byte based storage 
mechanism. A string type was definitely not the right choice.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus

El 15/08/17 a les 22:08, Luca Olivetti ha escrit:

El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.



If that's all it's OK then, thank you.


Sorry for the direct reply, it was meant for the list.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus

El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.


If that's all it's OK then, thank you.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus

El 15/08/17 a les 21:38, Ondrej Pokorny via Lazarus ha escrit:

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.



If that's all it's OK then, thank you.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] The new kid is growing up fast

2017-08-15 Thread Ondrej Pokorny via Lazarus

Too bad that Eugene didn't decide to improve Lazarus Cocoa bindings :)

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Ondrej Pokorny via Lazarus

On 15.08.2017 21:34, Mattias Gaertner via Lazarus wrote:

On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:


[...]
*If* code that worked before (and dare I say without abusing the
language) suddenly breaks, the bug is in the compiler and not in the
library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.


Furthermore, if you use(d) strings for binary data, just replace old 
string for AnsiString/RawByteString (and Char for AnsiChar, PChar for 
PAnsiChar) and you are good to go. Annoying but no big deal.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Tue, 15 Aug 2017 21:22:10 +0200
Luca Olivetti via Lazarus  wrote:

>[...]
> *If* code that worked before (and dare I say without abusing the 
> language) suddenly breaks, the bug is in the compiler and not in the 
> library.

... unless of course the incompatibility is deliberate and documented.
In this case it is.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Luca Olivetti via Lazarus

El 15/08/17 a les 21:14, Graeme Geldenhuys via Lazarus ha escrit:

On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote:

but for 3rd party libraries/components (e.g.
synapse comes to mind


Then better start filing bug reports to all those 3rd party libraries 
and components - they have been abusing the system and will silently 
fail. Not to mention that FPC is almost at v3.0.4 and the new string 
changes were introduced in v3.0.0 already.


Wait a minute, why "abuse"?
After all, before code aware strings, an ansistring could store any kind 
of arbitrary data with no problem and no conversion, and made it 
extremely easy to, e.g., add bytes to a buffer or find and extract data 
from the same buffer.
*If* code that worked before (and dare I say without abusing the 
language) suddenly breaks, the bug is in the compiler and not in the 
library.
(I remarked the "if" because I don't know if that's the case, according 
to Bo Berglund's experience it is)


Bye

--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus

On 2017-08-15 18:29, Luca Olivetti via Lazarus wrote:

but for 3rd party libraries/components (e.g.
synapse comes to mind


Then better start filing bug reports to all those 3rd party libraries 
and components - they have been abusing the system and will silently 
fail. Not to mention that FPC is almost at v3.0.4 and the new string 
changes were introduced in v3.0.0 already.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread wkitty42--- via Lazarus

On 08/15/2017 05:25 AM, Michael Van Canneyt via Lazarus wrote:

As it is now, FPC offers a way out for all cases:

WideString/UnicodeString for those that want 2-byte characters.



what if 3 and 4 byte characters are required? will they also work in 
UnicodeStrings?

i'm looking at this from a linux POV but have been trying to come from the very 
old school DOS TP stuff using codepages... especially needing to be able to read 
codepage strings and properly convert all their characters to UTF-8...


converting back would be a huge help, too... even with the possible loss of 
characters requiring replacing them with "?" or something to hold their place 
and show they didn't convert... that or even leaving them in their 2, 3 or 4 
byte form and let those using older codepage stuff see them raw...



--
 NOTE: No off-list assistance is given without prior approval.
   *Please keep mailing list traffic on the list unless*
   *a signed and pre-paid contract is in effect with us.*
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Graeme Geldenhuys via Lazarus

On 2017-08-15 10:52, Michael Van Canneyt via Lazarus wrote:

The only 'problem' is that TStrings uses a single-byte string.


Why can't that be changed to a UnicodeString or UTF8String - after all, 
the Unicode standard is meant to support all languages. I would have 
thought that would be an obvious move for a Unicode-aware RTL. TStrings 
could also be extended (if it hasn't already) to keep track of what 
encoding is read in from file, and what encoding in should procedure 
when lines are extracted - in case those two encodings are not the same.


Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Tue, 15 Aug 2017 16:44:30 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote:
> > Do you mean a 'char' is a string in your proposal?   
> Nope. In my proposal there would be Chars for any statically encoded 
> String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically 
> encoded string (and char) brands, it's just an extension of the existing 
> paradigm.

8 bytes?

Do you propose a string without the array operator [] ?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 14:53, Mattias Gaertner via Lazarus wrote:
Do you mean a 'char' is a string in your proposal? 
Nope. In my proposal there would be Chars for any statically encoded 
String Type, hence 1, 2, 4, and 8 byte wide. (As regarding statically 
encoded string (and char) brands, it's just an extension of the existing 
paradigm.


I did not think about the necessity to also have a dynamically encoded 
Char type. If yes, it (like a string) would need the additional fields 
for encoding number and bytes_per_char, and the appropriate compiler 
magic to handle them appropriately (workalike to a on-element string).


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus



On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:


On Tue, 15 Aug 2017 14:26:34 +0200
Michael Schnell via Lazarus  wrote:


On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
> Why shouldn't there be a single char type that intuitively represents 
> a single character regardless of how many bytes are used to represent it. 

I suppose by "char" you mean "single printable thingy" with Unicode it's 
rather debatable what such a thingy is.


Hence a Unicode singe char would need to be just be a Unicode string.


Do you mean a 'char' is a string in your proposal?


That would be a neat recursive definition :)

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Tue, 15 Aug 2017 14:26:34 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
> > Why shouldn't there be a single char type that intuitively represents 
> > a single character regardless of how many bytes are used to represent it.  
> 
> I suppose by "char" you mean "single printable thingy" with Unicode it's 
> rather debatable what such a thingy is.
> 
> Hence a Unicode singe char would need to be just be a Unicode string.

Do you mean a 'char' is a string in your proposal?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:


3. The problem with string handling today is that it is not based on a 
consistent approach to the character type.


If you clean up character handling then the model for string
handling should become obvious. A string is after all no more than
a container for a character array and which should be constrained
to have the same character encoding. A string should intuitively
represent a string of text regardless of how many bytes are used
to represent each character and with dynamic attributes to tell
you how it is encoded.


4. FPC should clean up Delphi's mess for it. If a unified string type 
follows a consistent model then it should be possible to make all 
Delphi string types synonyms.


You will need to allow exceptions for legacy programs that insist
on manipulating the bytes themselves - but that is not rocket
science. There is also the issue of the Windows API and its
insistence on Wide Strings - but isn't that why calling
conventions such as cdecl and stdcall exist - to tell the compiler
when it needs to reformat the call for a given API convention.


see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support


-Michael
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:
Why shouldn't there be a single char type that intuitively represents 
a single character regardless of how many bytes are used to represent it.


I suppose by "char" you mean "single printable thingy" with Unicode it's 
rather debatable what such a thingy is.


Hence a Unicode singe char would need to be just be a Unicode string.

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Bart via Lazarus
On 8/15/17, Tony Whyman via Lazarus  wrote:

> 2. Clean up the char type.
>
> Why shouldn't there be a single char
> type that intuitively represents a single character regardless of
> how many bytes are used to represent it.

You would have to define what "a single character" means in the first place.
This is especially important when it involves precomposed characters
and combining characters.

Bart
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] The new kid is growing up fast

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 13:19, Graeme Geldenhuys via Lazarus wrote:

Just wanted to show you guys something.


Great.

CrossVCL seems to allow to easily port Delphi VCL applications to Mac 
and Linux.


How to compare it against Lazarus ?

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


[Lazarus] The new kid is growing up fast

2017-08-15 Thread Graeme Geldenhuys via Lazarus

Hi guys,

Just wanted to show you guys something. The new kid on the block is 
growing up very fast CrossVCL.


   https://www.youtube.com/watch?v=_lr_BQlXvkk

I believe the programmer is the ex-FMX (FireMonkey) developer that was 
let go by Embarcadero, and he is hitting back with a vengeance. The 
CrossVCL project has grown from nothing to something in an extremely 
short time. Coming from a toolkit designer myself, that is very 
impressive to see.



Regards,
  Graeme

--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus



On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote:

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?
Regarding the users' appreciation, the S[x] notation is decently 
incompatible between the different string types and compiler versions.


Of course not.

It's 1 byte for ansistring, 2 bytes for widestring.

The point is that the compiler knows how many bytes it is based on the
declaration of S. In your proposal, it is dynamic, if I understand it
correctly.

There were hundreds of complains in all the appropriate forums and 
mailing list.


Complaints about what exactly ?



So not much additional harm can be done, anyway.

I suggest that it should be according to the character_size definition 
stored S, and the operation c := S[x] should transfer the appropriate 
count of bits, provided the type of c allows for taking them.


As far as I understand your proposal, this currently cannot be done ?

The compiler needs to know the S[X] size at compile time.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 12:15, Michael Van Canneyt via Lazarus wrote:

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?
Regarding the users' appreciation, the S[x] notation is decently 
incompatible between the different string types and compiler versions.


There were hundreds of complains in all the appropriate forums and 
mailing list.


So not much additional harm can be done, anyway.

I suggest that it should be according to the character_size definition 
stored S, and the operation c := S[x] should transfer the appropriate 
count of bits, provided the type of c allows for taking them.


This seems to be compatible to the current implementation of any 1-Byte 
brand and UTF16.


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 12:11, Mattias Gaertner via Lazarus wrote:
It does not explain what the characters of DynamicString are, does it? 


I don't understand what you are asking.

The element size and encoding of a Dynamic String ("CP_ANY" in the 
document) are not predefined, but depend on the content:


http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
-> Defining String variables and String types:
*CP_ANY* = $FF00 // ElementSize dynamically assigned // fully 
dynamical String for intermediate storing string content // just 
assigned to the Type or variable, never used in the "Encoding" field 
in the string header. 



Hence it stores the "branding" when it is assigned to from a string with 
a fixed branding (such as *CP_UTF8*), and the content is auto-converted 
if necessary when  assigning form CP_ANY to a fixed branded string variable.



If (in your example) the data is read from a file, a CP_ANY Strings 
based StringList would keep the encoding/char_size of the data as t is 
in the file (it would need to somehow get to know the presumed encoding 
of the file, anyway) and store that information in the 
EncodingBrandNumber and ElementSize fields (which do exist in any 
"NewString" variable, anyway), in each String read.


If the user assignes an element of the stringlist to a fixed branding 
(such as *CP_UTF8*),  the content obviously is auto-converted if 
necessary when  assigning form CP_ANY to a fixed branded string 
variable, as usual.


In fact I suppose that the current implementation of TStringlist does 
not use new strings to store the data on the heap, but I never said that 
trying to implement such idea would not require a lot of work.


-Michael
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus



On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:


On Tue, 15 Aug 2017 12:02:28 +0200
Michael Schnell via Lazarus  wrote:


On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:
> This cannot be solved properly except by duplicating the classes unit. 

Sorry to disagree, but IMHO this can only be solved properly by defining 
an additional fully dynamically encoded string type and use same for 
TStrings (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
)


It does not explain what the characters of DynamicString are, does it?


I was just going to write that.

The problem of the element size is circumvented by simply not digging into it.

What does S[2] mean in your proposal ? Is it 1, 2, 4 or even 8 bytes ?


Michael.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Tue, 15 Aug 2017 12:02:28 +0200
Michael Schnell via Lazarus  wrote:

> On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:
> > This cannot be solved properly except by duplicating the classes unit.  
> 
> Sorry to disagree, but IMHO this can only be solved properly by defining 
> an additional fully dynamically encoded string type and use same for 
> TStrings (see -> 
> http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support
>  
> )

It does not explain what the characters of DynamicString are, does it?

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 11:52, Michael Van Canneyt via Lazarus wrote:

This cannot be solved properly except by duplicating the classes unit.


Sorry to disagree, but IMHO this can only be solved properly by defining 
an additional fully dynamically encoded string type and use same for 
TStrings (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support 
)


But I am perfectly aware that implementing this would be a huge effort 
(see other mail here), and nobody i entitled to ask for this. (I wrote 
the article just to elaborate what was discussed in the fpc mailing list 
at that time.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 11:15, Tony Whyman via Lazarus wrote:


In this case, I would argue that both are true.
And the culprit obviously is Embarcadeo and not the fpc or the Lazarus 
team, who did their best to try to do a compatible and implementation 
that is really workable on the multiple supported platforms (which E$ 
did not feel necessary when they released the encoding aware strings).


Maybe a better solution can be found, but who would want to nudge the 
fpc / Lazarus developers to invest a huge amount of time to create it 
and then make sure it is decently tested stable ?


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus



On Tue, 15 Aug 2017, Michael Schnell via Lazarus wrote:


On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte 
characters.

The shortstring is even still available.


IM (often stated) O, this does not help as long as TStrings does not 
without forced auto-conversion support the string type the user is 
inclined to choose.


Please check TStrings in trunk. This exists.

procedure LoadFromFile(const FileName: string; AEncoding: TEncoding); overload; 
virtual;
procedure LoadFromStream(Stream: TStream; AEncoding: TEncoding); overload; 
virtual;

The only 'problem' is that TStrings uses a single-byte string.

This cannot be solved properly except by duplicating the classes unit.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 15.08.2017 11:25, Michael Van Canneyt via Lazarus wrote:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte 
characters.

The shortstring is even still available.


IM (often stated) O, this does not help as long as TStrings does not 
without forced auto-conversion support the string type the user is 
inclined to choose.


This obviously requires an (additional) fully dynamic string brand.

This (again obviously) is not the "Embarcadero way", but supposedly does 
not necessarily lead to incompatibility regarding the user code.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Van Canneyt via Lazarus



On Tue, 15 Aug 2017, Mattias Gaertner via Lazarus wrote:


On Mon, 14 Aug 2017 18:47:58 +0200
Sven Barth via Lazarus  wrote:


[...]
The main problem of such a dynamic type would be the inability to do fast
indexing as the compiler would need to insert runtime checks for the size
of a character. I had already thought the same, but then had to discard the
idea due to this.


IMHO the main problem of adding a new string type is
https://xkcd.com/927/


Exactly. I don't think we should add even more.

As it is now, FPC offers a way out for all cases:

WideString/UnicodeString for those that want 2-byte characters.
A codepage-aware single-byte string for those that want 1-byte characters.
The shortstring is even still available.

Attempting to store binary data in a string is not advisable. 
Dynamic arrays, TBytes and - in the worst case - TBytesStream are powerful enough to

cover most use-cases in this area.

Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus

You can me as a "like" on that one.


On 15/08/17 10:13, Mattias Gaertner via Lazarus wrote:

IMHO the main problem of adding a new string type is
https://xkcd.com/927/


--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus

On 14/08/17 22:01, Juha Manninen via Lazarus wrote:

Tony Whyman, this issue has been discussed again and again for the
past 10+ years first in FPC mailing lists and then in Lazarus lists.
The current Unicode support in Lazarus works f***ing well and is
amazingly compatible with Delphi.
WinAPI parameters may require an explicit temporary UnicodeString
variable but even then the code is compatible with Delphi.

Tony Whyman, Marcos Douglas and Michael Schnell, please study the facts.
For starters, this is about the current Unicode support in Lazarus:
   http://wiki.freepascal.org/Unicode_Support_in_Lazarus
I think the dynamic encoding and automatic conversion now work perfectly well.
If you have a piece of code where it does not work, please ask for
detailed info.
If a topic keeps on being discussed after 10+ years of argument, the 
reason is usually either (a) the problem and its solution have not been 
documented properly, or (b) the outcome is an unsatisfactory compromise.


In this case, I would argue that both are true.

I went back and read the wiki article you mentioned and was no more the 
wiser as to why the current mess exists. Is it really no more than 
because Delphi continues to screw up in this area, so must FPC? The body 
of the article appears to be a set of notes - not necessarily wrong in 
themselves but lacking the background and context needed to explain why 
it is like it is.


This problem will keep coming up until it is fixed properly and, by 
that, I mean the that solution is consistent, understandable intuitively 
and well documented. Windows eccentricity also need to kept to Windows.


Here is my wish list:

1. Stop using the term "Unicode".

   It is too ambiguous. It is used as both an all embracing term for
   multi-byte encoding and as a synonym for UTF16 and that is really
   too confusing. The problem is made worse by having UnicodeString as
   a two byte wide string type in both FPC and Delphi.


2. Clean up the char type.

   When Wirth created the "char" type in Pascal it was a simple ASCII
   or EBCDIC character. There are now seven different char types
   (including type equivalence) with no guidelines on when each is
   applicable. This is too many. Why shouldn't there be a single char
   type that intuitively represents a single character regardless of
   how many bytes are used to represent it. Yes, in a world where we
   have to live with UTF8, UTF16, UTF32, legacy code pages and Chinese
   variations on UTF8, that means that dynamic attributes have to be
   included in the type. But isn't that the only way to have consistent
   and intuitive character handling?


3. The problem with string handling today is that it is not based on a 
consistent approach to the character type.


   If you clean up character handling then the model for string
   handling should become obvious. A string is after all no more than a
   container for a character array and which should be constrained to
   have the same character encoding. A string should intuitively
   represent a string of text regardless of how many bytes are used to
   represent each character and with dynamic attributes to tell you how
   it is encoded.


4. FPC should clean up Delphi's mess for it. If a unified string type 
follows a consistent model then it should be possible to make all Delphi 
string types synonyms.


   You will need to allow exceptions for legacy programs that insist on
   manipulating the bytes themselves - but that is not rocket science.
   There is also the issue of the Windows API and its insistence on
   Wide Strings - but isn't that why calling conventions such as cdecl
   and stdcall exist - to tell the compiler when it needs to reformat
   the call for a given API convention.

Tony Whyman



-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Mon, 14 Aug 2017 18:47:58 +0200
Sven Barth via Lazarus  wrote:

>[...]
> The main problem of such a dynamic type would be the inability to do fast
> indexing as the compiler would need to insert runtime checks for the size
> of a character. I had already thought the same, but then had to discard the
> idea due to this.

IMHO the main problem of adding a new string type is
https://xkcd.com/927/

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Mattias Gaertner via Lazarus
On Sat, 12 Aug 2017 17:56:58 -0300
"Marcos Douglas B. Santos via Lazarus" 
wrote:

>[...]
> > Which one? Do you mean Windows CP-1252?  
> 
> Yes...
> But would it make any difference?

Just

> >>[...]
> >> Warning: Implicit string type conversion from "AnsiString" to "WideString" 
> >>  
> >
> > Explicit type cast:
> >
> > Lib.SetLicense(
> >WideString(IniFile.ReadString('TheLib', 'license', ''))
> > );  
> 
> Wow... everywhere? :(

You could instead define an overloaded Lib.SetLicense(AnsiString). Or
you could disable this hint altogether for your project (not
recommended). Select the message in the Messages window. Right click
and click on add -vm

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Tony Whyman via Lazarus

On 14/08/17 17:47, Sven Barth via Lazarus wrote:
The main problem of such a dynamic type would be the inability to do 
fast indexing as the compiler would need to insert runtime checks for 
the size of a character. I had already thought the same, but then had 
to discard the idea due to this.


Is this really a big problem? It is not as if it would be necessary to 
do a table lookup everytime you index a string as the indexing method 
could be an attribute of the string and updated with the character 
encoding attribute. Is it really that complicated for the compiler to 
generate code that jumps to an indexing method depending upon a data 
attribute?


Is your problem really more about the result type as, depending on the 
character width, the result could be an AnsiChar or WideChar or a UTF8 
character for which I don't believe there is a defined char type (other 
than an arguable  mis-use of UCS4Char)?


I can accept that a clear up of this area would also have to extend to 
the char types as well - but I would also argue that that is well 
overdue. On a quick count, I found 7 different char types in the system 
unit.

--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 14.08.2017 18:47, Sven Barth via Lazarus wrote:


The main problem of such a dynamic type would be the inability to do 
fast indexing as the compiler would need to insert runtime checks for 
the size of a character.



What "indexing" do you think of ?
Could you give an example where such a difference is supposed to get 
important ?


(As you know I wrote a paper where I claimed the contrary. I'd like to 
revise same if necessary.)


-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] String vs WideString

2017-08-15 Thread Michael Schnell via Lazarus

On 14.08.2017 18:49, Sven Barth via Lazarus wrote:


Because the crowd demanding Delphi compatibility is larger than the 
crowd demanding exact terminology.



... or even a revised concept avoiding the junk presented by Embarcadero :(

But obviously the fpc team has no choice.

-Michael
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus