Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-03 Thread Michael Schnell
On 12/03/2014 05:02 AM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: - It does not result in additional conversions. It does, e.g. in searching or sorting of StringList, when it can contain strings of different encodings. The choice of a unique encoding for application strings (maybe

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-03 Thread Michael Schnell
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote: You forget that Jonas refers to *dynamic* string encodings, unknown at compile time. ??? In you other mail you pointed out that fpc (other than Delphi) does not provide *dynamic* string encoding with RawByteString (and where else would it

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-03 Thread Michael Schnell
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote: In Delphi *no* string can have an dynamic encoding of CP_NONE or CP_ACP, If you really do have Dynamic strings, obviously, the *definition* (i.e. CP_...) of such strings is strictly static (just for compiler use) and never cant be used as

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-02 Thread Michael Schnell
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote: You suggested to use string as UTF-16 on Windows, and UTF-8 on Linux. That's what I understand as a unique program-wide string representation (not sourcecode-wide, instead program as *compiled*). Then I cannot see any need or use for

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-02 Thread Michael Schnell
On 12/02/2014 01:05 PM, Michael Schnell wrote: But why do you say would be appreciated ? Is it not possible to use RawByteString in a way the name suggests, by never bringing it together with any String variable of a different encoding brand and hence avoid any conversion - be same

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-02 Thread Michael Schnell
On 11/29/2014 07:55 AM, Jonas Maebe wrote: Exactly the same goes for converting strings with code page CP_NONE to a different code page: your program is broken when it tries to do that, While accessing an array beyond its bounds is not detectable at compile time and accessing an array beyond

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-12-02 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote: Apart from that, every encoding-tolerant code will execute much slower than code without a need for checks and conversions everywhere. As I pointed out I don't agree at all. - The check is only two ASM instructions

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-29 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: On 28/11/14 21:30, Hans-Peter Diettrich wrote: I prefer to specify and document everything *before* coding, so that everybody can expect that the code will behave as specified. If certain behaviour is explicitly undefined, it *is* specified and documented. It means that

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Jonas Maebe
On 27 Nov 2014, at 17:11, Hans-Peter Diettrich drdiettri...@aol.com wrote: Such statements come only from writers that do not believe that their words can be understood in various ways ;-) I'm sorry, but I simply cannot discuss with people that, when I literally state the result is

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Michael Schnell
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote: The universal paradigm would allow for extensions (e.g. UTF-32, multiple 16 Bit Code pages, an additional fully dynamic String type, n-byte un-encoded string types), as I described in the Wiki page. Even if feasable, such arbitrary string

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Michael Schnell
On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: E.g. there are (are least two Code pages for UTF-16 (LE, and BE), that would be worth supporting. You are confusing codepages and encodings :-( That is why I put goose-feet around Code pages. I used this wording

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: I'm sorry, but I simply cannot discuss with people that, when I literally state the result is undefined, think that I may actually have meant the result is defined and if you change the implementation and/or keep it stable across compiler releases, then it will also conform

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I fear that there will be code that relies on the flawed behavior of RawByteString (it's a feature, not a bug) and using the same name with different behavior would brake same. And a really usable DynmicString would not adhere to that description. How can somebody

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote: An *efficient* implementation would be based on a single program-wide string representation, with different encodings being handled only in an exchange with external data sources. Yep. But it would result in severe

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-28 Thread Jonas Maebe
On 28/11/14 21:30, Hans-Peter Diettrich wrote: I prefer to specify and document everything *before* coding, so that everybody can expect that the code will behave as specified. If certain behaviour is explicitly undefined, it *is* specified and documented. It means that your program is buggy if

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: concatenated without data loss and that the result is then converted to the target string's encoding (except in case the target is RawByteString). How that is implemented exactly is undefined; again in the meaning of undefined, not in the

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell
On 11/26/2014 05:25 PM, Sven Barth wrote: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. No, you can't, because the RTL does not handle that. For AnsiString the element size is *always* 1. It's

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell
On 11/26/2014 05:37 PM, Jonas Maebe wrote: invalid (in the meaning of undefined) in both FPC and Delphi. Sorry (I am not a native speaker). But to me undefined and invalid have completely different meanings (in this context). An Invalid use of the language would result in an error (compiler

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell
On 11/26/2014 09:30 PM, Hans-Peter Diettrich wrote: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. Not in Delphi XE. Thanks for the clarification. I did have some hope that fpc would be (or could be

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Michael Schnell
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote: Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. Sorry for sloppy wording. Of course I did mean element size (Character here

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Jonas Maebe
On 26/11/14 23:41, Hans-Peter Diettrich wrote: In this case the implementation is compiler specific, somewhat different from undefined (in a RawByteString): CP_NONE: this value indicates that no code page information has been associated with the string data. The result of any explicit or

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I now understand that the Element Size field in the String header is quite dummy, as under the hood there are two completely separate concepts for one-byte-Strings and 2-Byte Strings and none for other Element sizes. After a code review I realized that the element

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote: Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. Sorry for sloppy wording. Of course I did mean

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: On 26/11/14 23:41, Hans-Peter Diettrich wrote: In this case the implementation is compiler specific, somewhat different from undefined (in a RawByteString): CP_NONE: this value indicates that no code page information has been associated with the string data. The result of

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote: An AnsiString consists of AnsiChar's. The *meaning* of these char's (bytes) depends on their encoding, regardless of whether the used encoding is or is not stored with the string. I understand that the

[fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
I fail to understand some of the text. It seems to be unavoidable to use the name ANSIString even though I always though up when seeing a thing called ANSI containing Unicode (e. g. UTF8String = type AnsiString(CP_UTF8) ). Seemingly here the bytes per character setting implicitly is

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell mschn...@lumino.de wrote: [...] It seems to be unavoidable to use the name ANSIString even though I always though up when seeing a thing called ANSI containing Unicode (e. g. UTF8String = type AnsiString(CP_UTF8) ). Is there a question?

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not right, than due later) ? What is a

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Sven Barth
Am 26.11.2014 11:53 schrieb Michael Schnell mschn...@lumino.de: On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell mschn...@lumino.de wrote: [...] 2) I fail to understand how with this explanation that seems to force auto conversion for assignments between types with different code page settings (also for CP_ACP) the static code page can differ from the

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:52:50 +0100 Michael Schnell mschn...@lumino.de wrote: On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String without brackets which in

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:13 PM, Mattias Gaertner wrote: In mode delphiunicode String=UnicodeString. I see. So even in Delphi XE where UnicodeString is denoted by CP_UTF16, the value of the constant CP_UTF16 is not the same as the value of the (constant or) variable CP_ACP, (while OTOH using the

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:10 PM, Mattias Gaertner wrote: the results of conversions from/to the CP_NONE code page are undefined. ... because CP_NONE is not a real code page. So you understand result as what you would get when printing. In the context of this wiki page I would understand result as the

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
After re-reading yet another question: In section String concatenations there is no mentioning about auto-conversion. For statically typed Strings it's rather obvious that they will be auto-converted if appropriate. Technically - if differently encode - they seem to be converted to Unicode

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Sven Barth
Am 26.11.2014 12:37 schrieb Michael Schnell mschn...@lumino.de: On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Jonas Maebe
On 26/11/14 12:53, Michael Schnell wrote: [CP_NONE] Is this undefined in the meaning of not predictable by the user in the current version of fpc, or in the meaning of due to change when updating fpc. This undefined literally means undefined. It does not mean undefined in a meaning that is

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Michael Schnell
On 11/26/2014 03:05 PM, Sven Barth wrote: OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String without brackets which in turn is the same as String(CP_UTF16) ? Correct ? There is no String with brackets. You can only use AnsiString

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Jonas Maebe
On 26/11/14 13:11, Michael Schnell wrote: In section String concatenations there is no mentioning about auto-conversion. There is. For statically typed Strings it's rather obvious that they will be auto-converted if appropriate. It's probably rather obvious because it is literally mentioned

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Sven Barth
Am 26.11.2014 15:30 schrieb Mattias Gaertner nc-gaert...@netcologne.de: On Wed, 26 Nov 2014 15:05:16 +0100 Sven Barth pascaldra...@googlemail.com wrote: [...] While both AnsiString and UnicodeString have the current codepage and the character size in their header record AFAIK

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Jonas Maebe
On 26/11/14 17:21, Sven Barth wrote: Yes, nevertheless the header record is the same for UnicodeString and AnsiString and thus it also has a codepage field which is always initialized to CP_UTF16 however. It can also be CP_UTF16BE (which it is on big endian FPC targets right now). Jonas

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Jonas Maebe
On 26/11/14 16:19, Michael Schnell wrote: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. As several people have told you several times, that is invalid (in the meaning of undefined) in both FPC and Delphi.

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 17:23:48 +0100 Jonas Maebe jonas.ma...@elis.ugent.be wrote: On 26/11/14 17:21, Sven Barth wrote: Yes, nevertheless the header record is the same for UnicodeString and AnsiString and thus it also has a codepage field which is always initialized to CP_UTF16 however. It

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 17:50:31 +0100 Mattias Gaertner nc-gaert...@netcologne.de wrote: On Wed, 26 Nov 2014 17:23:48 +0100 Jonas Maebe jonas.ma...@elis.ugent.be wrote: On 26/11/14 17:21, Sven Barth wrote: Yes, nevertheless the header record is the same for UnicodeString and AnsiString

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell mschn...@lumino.de wrote: Seemingly here the bytes per character setting implicitly is thought of as a port of the code-page definition. correct ? Code page define bytes per character. Huh? Not all codepages

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not right, than

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I fail to understand some of the text. It seems to be unavoidable to use the name ANSIString even though I always though up when seeing a thing called ANSI containing Unicode (e. g. UTF8String = type AnsiString(CP_UTF8) ). Seemingly here the bytes per character

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: For example: CP_ACP=0, DefaultSystemCodePage=1252 That means static code page is always 0, while dynamic code page can be 0 or 1252. Both describe the same encoding. A *dynamic* encoding *never* can be CP_ACP nor CP_NONE (in Delphi). These values are allowed only

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. Not in Delphi XE. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: Technically, that section literally states that they will be concatenated without data loss and that the result is then converted to the target string's encoding (except in case the target is RawByteString). How that is implemented exactly is undefined; again in the meaning

Re: [fpc-devel] Trying to understand the wiki-Page FPC Unicode support

2014-11-26 Thread Sven Barth
On 26.11.2014 19:54, Hans-Peter Diettrich wrote: UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte encoding. Even if the Delphi architects may have thought about an common string type, with a variable element size (1,2,4), this certainly turned out soon as a stupid idea, so