Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-03 Thread Michael Schnell
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote: In Delphi *no* string can have an dynamic encoding of CP_NONE or CP_ACP, If you really do have "Dynamic" strings, obviously, the *definition* (i.e. CP_...) of such strings is strictly static (just for compiler use) and never cant be used as

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-03 Thread Michael Schnell
On 12/03/2014 12:52 AM, Hans-Peter Diettrich wrote: You forget that Jonas refers to *dynamic* string encodings, unknown at compile time. ??? In you other mail you pointed out that fpc (other than Delphi) does not provide *dynamic* string encoding with RawByteString (and where else would it

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-03 Thread Michael Schnell
On 12/03/2014 10:42 AM, Michael Schnell wrote: That is why I tried to invent a concept BTW.: I can't help with the implementation, but I'll be happy to do testing and write documentation (e.g. in Wiki format). -Michael ___ fpc-devel maillist -

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-03 Thread Michael Schnell
On 12/03/2014 05:02 AM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: - It does not result in additional conversions. It does, e.g. in searching or sorting of StringList, when it can contain strings of different encodings. The choice of a unique encoding for application strings (maybe C

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/29/2014 07:55 AM, Jonas Maebe wrote: Exactly the same goes for converting strings with code page CP_NONE to a different code page: your program is broken when it tries to do that, While accessing an array beyond its bounds is not detectable at compile time and a

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote: Apart from that, every encoding-tolerant code will execute much slower than code without a need for checks and conversions everywhere. As I pointed out I don't agree at all. - The check is only two ASM instructions

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Mattias Gaertner
On Tue, 02 Dec 2014 13:31:44 +0100 Michael Schnell wrote: >[...]*defined* *to* *be* *undefined* Ooh, that is soo meta. lol Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Michael Schnell
On 11/29/2014 05:36 PM, Hans-Peter Diettrich wrote: As Delphi doesn't allow for a dynamic encoding of CP_NONE, I don't understand the purpose of the FPC description. As you suggested in the other mail, the Delphi implementation of RawByteString is decently flawed and this supposedly is introduc

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Michael Schnell
On 11/29/2014 07:55 AM, Jonas Maebe wrote: Exactly the same goes for converting strings with code page CP_NONE to a different code page: your program is broken when it tries to do that, While accessing an array beyond its bounds is not detectable at compile time and accessing an array beyond i

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Michael Schnell
On 12/02/2014 01:05 PM, Michael Schnell wrote: But why do you say "would be appreciated" ? Is it not possible to use "RawByteString" in a way the name suggests, by never bringing it together with any String variable of a different encoding brand and hence avoid any conversion - be same intentio

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-12-02 Thread Michael Schnell
On 11/28/2014 09:15 PM, Hans-Peter Diettrich wrote: You suggested to use "string" as UTF-16 on Windows, and UTF-8 on Linux. That's what I understand as a unique program-wide string representation (not sourcecode-wide, instead program as *compiled*). Then I cannot see any need or use for anoth

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-29 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: On 28/11/14 21:30, Hans-Peter Diettrich wrote: I prefer to specify and document everything *before* coding, so that everybody can expect that the code will behave as specified. If certain behaviour is explicitly undefined, it *is* specified and documented. It means that yo

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Jonas Maebe
On 28/11/14 21:30, Hans-Peter Diettrich wrote: > I prefer to specify and document everything *before* coding, so that > everybody can expect that the code will behave as specified. If certain behaviour is explicitly undefined, it *is* specified and documented. It means that your program is buggy i

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I fear that there will be code that relies on the "flawed" behavior of RawByteString ("it's a feature, not a bug") and using the same name with different behavior would brake same. And a really usable DynmicString would not adhere to that description. How can somebo

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: I'm sorry, but I simply cannot discuss with people that, when I literally state "the result is undefined", think that I may actually have meant "the result is defined and if you change the implementation and/or keep it stable across compiler releases, then it will also confo

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote: An *efficient* implementation would be based on a single program-wide string representation, with different encodings being handled only in an exchange with external data sources. Yep. But it would result in severe u

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Michael Schnell
On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote: Michael Schnell schrieb: E.g. there are (are least two "Code pages" for UTF-16 ("LE", and "BE"), that would be worth supporting. You are confusing codepages and encodings :-( That is why I put "goose-feet" around "Code pages". I used this wo

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Michael Schnell
On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote: The "universal paradigm" would allow for extensions (e.g. UTF-32, multiple 16 Bit Code pages, an additional fully dynamic String type, n-byte "un-encoded" string types), as I described in the Wiki page. Even if feasable, such arbitrary string

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-28 Thread Jonas Maebe
On 27 Nov 2014, at 17:11, Hans-Peter Diettrich wrote: > Such statements come only from writers that do not believe that their words > can be understood in various ways ;-) I'm sorry, but I simply cannot discuss with people that, when I literally state "the result is undefined", think that I

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote: An AnsiString consists of AnsiChar's. The *meaning* of these char's (bytes) depends on their encoding, regardless of whether the used encoding is or is not stored with the string. I understand that the implementation

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: On 26/11/14 23:41, Hans-Peter Diettrich wrote: In this case the implementation is "compiler specific", somewhat different from "undefined" (in a RawByteString): "CP_NONE: this value indicates that no code page information has been associated with the string data. The result

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote: Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. Sorry for sloppy wording. Of course I did mean "ele

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I now understand that the "Element Size" field in the String header is quite dummy, as under the hood there are two completely separate concepts for one-byte-Strings and 2-Byte Strings and none for other Element sizes. After a code review I realized that the element

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Hans-Peter Diettrich
Sven Barth schrieb: Just a little remark: please don't throw in WideString, which is a completely different type and only there for easy compatibility with COM and other Windows APIs. I mentioned it for completness, and because (at least in Delphi) the elements of an UnicodeString are WideCh

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote: An AnsiString consists of AnsiChar's. The *meaning* of these char's (bytes) depends on their encoding, regardless of whether the used encoding is or is not stored with the string. I understand that the implementation (in Delphi) seems to be d

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Jonas Maebe
On 26/11/14 23:41, Hans-Peter Diettrich wrote: > In this case the implementation is "compiler specific", somewhat > different from "undefined" (in a RawByteString): > "CP_NONE: this value indicates that no code page information has been > associated with the string data. The result of any explicit

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 07:13 PM, Hans-Peter Diettrich wrote: Not all codepages have a fixed number of bytes per character. The string preamble contains the *element size* (1 for AnsiString), just like with every dynamic array. Sorry for sloppy wording. Of course I did mean "element size" ("Character" h

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 07:54 PM, Hans-Peter Diettrich wrote: Delphi XE does not properly support UTF-8. That is what I supposed. Of course the developers at Embarcadero did not need to think about portability to other OSes than Windows when crafting the concept. But obviously fpc needs proper support

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 09:30 PM, Hans-Peter Diettrich wrote: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. Not in Delphi XE. Thanks for the clarification. I did have some hope that fpc would be (or could be extend

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 05:37 PM, Jonas Maebe wrote: invalid (in the meaning of "undefined") in both FPC and Delphi. Sorry (I am not a native speaker). But to me "undefined" and "invalid" have completely different meanings (in this context). An "Invalid" use of the language would result in an error (com

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Michael Schnell
On 11/26/2014 05:25 PM, Sven Barth wrote: > > So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. No, you can't, because the RTL does not handle that. For AnsiString the element size is *always* 1. It's hard

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-27 Thread Marco van de Voort
In our previous episode, Hans-Peter Diettrich said: > > concatenated without data loss and that the result is then converted to > > the target string's encoding (except in case the target is > > RawByteString). How that is implemented exactly is undefined; again in > > the meaning of "undefined", n

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Sven Barth
On 26.11.2014 19:54, Hans-Peter Diettrich wrote: UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte encoding. Even if the Delphi architects may have thought about an common string type, with a variable element size (1,2,4), this certainly turned out soon as a stupid idea, so

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Jonas Maebe schrieb: Technically, that section literally states that they will be concatenated without data loss and that the result is then converted to the target string's encoding (except in case the target is RawByteString). How that is implemented exactly is undefined; again in the meaning

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: So seemingly you could do MyStringType = type AnsiString(CP_UTF16), and seemingly the size information is set according to this. Not in Delphi XE. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http:/

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: For example: CP_ACP=0, DefaultSystemCodePage=1252 That means static code page is always 0, while dynamic code page can be 0 or 1252. Both describe the same encoding. A *dynamic* encoding *never* can be CP_ACP nor CP_NONE (in Delphi). These values are allowed only for

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: I fail to understand some of the text. It seems to be unavoidable to use the name "ANSIString" even though I always though up when seeing a thing called "ANSI" containing Unicode (e. g. "UTF8String = type AnsiString(CP_UTF8)" ). Seemingly here the "bytes per chara

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not right, than due

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Michael Schnell schrieb: On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String wi

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Hans-Peter Diettrich
Mattias Gaertner schrieb: On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell wrote: Seemingly here the "bytes per character" setting implicitly is thought of as a port of the "code-page" definition. correct ? Code page define bytes per character. Huh? Not all codepages have a fixed numbe

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 17:50:31 +0100 Mattias Gaertner wrote: > On Wed, 26 Nov 2014 17:23:48 +0100 > Jonas Maebe wrote: > > > On 26/11/14 17:21, Sven Barth wrote: > > > Yes, nevertheless the header record is the same for UnicodeString and > > > AnsiString and thus it also has a codepage field whic

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 17:23:48 +0100 Jonas Maebe wrote: > On 26/11/14 17:21, Sven Barth wrote: > > Yes, nevertheless the header record is the same for UnicodeString and > > AnsiString and thus it also has a codepage field which is always > > initialized to CP_UTF16 however. > > It can also be CP_U

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Jonas Maebe
On 26/11/14 16:19, Michael Schnell wrote: > So seemingly you could do MyStringType = type > AnsiString(CP_UTF16), and seemingly the size information is set > according to this. As several people have told you several times, that is invalid (in the meaning of "undefined") in both FPC and Delp

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Jonas Maebe
On 26/11/14 17:21, Sven Barth wrote: > Yes, nevertheless the header record is the same for UnicodeString and > AnsiString and thus it also has a codepage field which is always > initialized to CP_UTF16 however. It can also be CP_UTF16BE (which it is on big endian FPC targets right now). Jonas _

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Sven Barth
Am 26.11.2014 15:30 schrieb "Mattias Gaertner" : > > On Wed, 26 Nov 2014 15:05:16 +0100 > Sven Barth wrote: > > >[...] > > While both AnsiString and UnicodeString have the current codepage and the > > character size in their header record > > AFAIK UnicodeString has only a static (fixed) code page

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Jonas Maebe
On 26/11/14 13:11, Michael Schnell wrote: > In section "String concatenations" there is no mentioning about > auto-conversion. There is. > For statically typed Strings it's rather obvious that > they will be auto-converted if appropriate. It's probably rather obvious because it is literally ment

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
On 11/26/2014 03:05 PM, Sven Barth wrote: > > OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String without brackets which in turn is the same as String(CP_UTF16) ? Correct ? There is no "String with brackets". You can only use "AnsiString"

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Jonas Maebe
On 26/11/14 12:53, Michael Schnell wrote: [CP_NONE] > Is this "undefined" in the meaning of "not predictable by the user" in > the "current" version of fpc, or in the meaning of "due to change" when > updating fpc. This "undefined" literally means "undefined". It does not mean "undefined in a mean

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 15:05:16 +0100 Sven Barth wrote: >[...] > While both AnsiString and UnicodeString have the current codepage and the > character size in their header record AFAIK UnicodeString has only a static (fixed) code page. Mattias ___ fpc-d

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Sven Barth
Am 26.11.2014 12:37 schrieb "Michael Schnell" : > > On 11/26/2014 12:09 PM, Sven Barth wrote: >> >> In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). > > > OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP12

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
After re-reading yet another question: In section "String concatenations" there is no mentioning about auto-conversion. For statically typed Strings it's rather obvious that they will be auto-converted if appropriate. Technically - if differently encode - they seem to be converted to Unicode a

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:10 PM, Mattias Gaertner wrote: "the results of conversions from/to the CP_NONE code page are undefined." ... because CP_NONE is not a real code page. So you understand "result" as what you would get when printing. In the context of this wiki page I would understand "result" a

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:13 PM, Mattias Gaertner wrote: In mode delphiunicode String=UnicodeString. I see. So even in Delphi XE where "UnicodeString" is denoted by "CP_UTF16", the value of the constant CP_UTF16 is not the same as the value of the (constant or) variable CP_ACP, (while OTOH using the v

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
On 11/26/2014 12:09 PM, Sven Barth wrote: In Delphi (and FPC) CP_ACP corresponds by default with the current system codepage (e.g. CP1252 on a German Windows). OK. So in Delphi XE (in Germany) String(CP_ACP) is the same as String(CP1252) but different from String without brackets which in tu

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:52:50 +0100 Michael Schnell wrote: > On 11/26/2014 11:40 AM, Mattias Gaertner wrote: > > Ansistring supports only one byte per character code pages. > > Even more confused. Am I wrong thinking that with code aware Strings, > for Delphi XE compatibility, in Windows CP_AC

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell wrote: >[...] > 2) I fail to understand how with this explanation that seems to force > auto conversion for assignments between types with different "code page" > settings (also for CP_ACP) the "static code page can differ from the > dynamic c

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Sven Barth
Am 26.11.2014 11:53 schrieb "Michael Schnell" : > > On 11/26/2014 11:40 AM, Mattias Gaertner wrote: >> >> Ansistring supports only one byte per character code pages. > > > Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to b

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Michael Schnell
On 11/26/2014 11:40 AM, Mattias Gaertner wrote: Ansistring supports only one byte per character code pages. Even more confused. Am I wrong thinking that with code aware Strings, for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not right, than due later) ? What is a "Co

Re: [fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

2014-11-26 Thread Mattias Gaertner
On Wed, 26 Nov 2014 11:23:17 +0100 Michael Schnell wrote: >[...] > It seems to be unavoidable to use the name "ANSIString" even though I > always though up when seeing a thing called "ANSI" containing Unicode > (e. g. "UTF8String = type AnsiString(CP_UTF8)" ). Is there a question? > Seemi