Re: [fpc-devel] simple UTF tests
On 01/06/2012 06:53 PM, Hans-Peter Diettrich wrote: You're right, the XE compiler lacks some error checks :-( If this indeed is considered a bug in Delphi, FPC _could_ in fact in a more sane way provide the length of an AnsiString(CP_UTF16) in terms of Words (i.e. UTF codes, as done with UTF8), -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
In our previous episode, Michael Schnell said: You're right, the XE compiler lacks some error checks :-( If this indeed is considered a bug in Delphi, FPC _could_ in fact in a more sane way provide the length of an AnsiString(CP_UTF16) in terms of Words (i.e. UTF codes, as done with UTF8), An ansistring is always 8-bit. Nothing can be done there, except warn/error if 1200 is used. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
On 01/09/2012 10:32 AM, Marco van de Voort wrote: An ansistring is always 8-bit. Sorry I can't follow here. Of course the term ANSI suggests 8 bit, but it also suggest one visible character = 8 bit, thus non UTF. If a type called ANSI... is used to hold UTF codes, the term ANSI is abused anyway and now the handling of the type can be defined in any way that seems appropriate, -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
P.S.: I found here http://en.wikipedia.org/wiki/Windows_ANSI_code_page#ANSI_code_page that there in fact The following Windows code pages exist:The following Windows code pages exist: ... 65000 UTF-7 65001 UTF-8 But this seems to be be a propriety Microsoft definition while AFAIK, ANSI denotes American National Standards Institute. - Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
Michael Schnell schrieb: On 01/09/2012 10:32 AM, Marco van de Voort wrote: An ansistring is always 8-bit. Sorry I can't follow here. Of course the term ANSI suggests 8 bit, but it also suggest one visible character = 8 bit, thus non UTF. ANSI also covers MBCS (DBCS). If a type called ANSI... is used to hold UTF codes, the term ANSI is abused anyway and now the handling of the type can be defined in any way that seems appropriate, For legacy reasons Ansi means types with 8 bit AnsiChar(!), in contrast to WideChar or other Char sizes. DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
In our previous episode, Michael Schnell said: An ansistring is always 8-bit. Sorry I can't follow here. Of course the term ANSI suggests 8 bit, but it also suggest one visible character = 8 bit, thus non UTF. No, it means that the encoding granularity is 8-bit. Length returns encoding granularity, not codepoints (always 32-bit, encoded in sequences of 8 (ansistring) or 16 (widestring,uncidoestring) bits) or printable characters (possibly multiple codepoints) If a type called ANSI... is used to hold UTF codes, the term ANSI is abused anyway and now the handling of the type can be defined in any way that seems appropriate, Whatever the name is, in all current Unicode Delphi versions and FPC ansistring means 8-bit string exclusively. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
On 01/09/2012 11:09 AM, Marco van de Voort wrote: Whatever the name is, in all current Unicode Delphi versions and FPC ansistring means 8-bit string exclusively. OK so the definition of AnsiString(CP_UTF16) in FPC and AnsiString(1200) in Delphi means an 8 bit string with data Bytes representing UTF-16 codes in low-byte-First notation. This of course is not very straightforward, sane or portable but not illegal and not ambiguous either. (Of course this definition should be explicitly given somewhere) -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
But this seems to be be a propriety Microsoft definition while AFAIK, ANSI denotes American National Standards Institute. While ANSI does denote american national standards institute in general it doesn't really mean that in this context. A windows machine has two main code pages in use (both language dependent and for some languages they may be the same code page). The OEM code page and the ANSI code page. The OEM code page is one of the original PC code pages and afaict is mostly used for the console. The ANSI code page is used for the non-unicode versions of stuff in windows itself. The term ANSI comes from the fact that the initial ANSI code page (1252) was based on an ANSI draft of what became ISO-8859-1. 1252 is fairly close to ISO-8859-1 (it just replaces rarely used control characters with more printable characters) but most of the other ANSI code pages bear little to no relationship to any ANSI or ISO standard encoding. Afaict in europe, america and australasia both the ANSI and OEM code pages are simple encodings with one byte per user-visiable character and all characters drawn left to right. Once you move to asia and africa though that no longer holds with CJK languages being represented by multibyte encodings, vietnamese being represented using combining characters and middle eastern languages bringing the complications of bidirectional text. MS encourages programmers to use unicode nowasays and afiact languages added more recently to windows (like the indic languages) don't have any non-unicode support at all. Windows also defines other code page numbers that are used as neither ANSI or OEM code pages. UTF-8 falls into this category. Delphi is a windows program (yeah there was an abortive linux port but that came much later and didn't stick arround for long) so it inherits windows terminology. FPC/lazarus is essentially a delphi clone but is cross platform so it's put in the position of trying to interpret and stretch windows grounded ideas to fit a cross-platform context. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
So it's me who did go insane. But at least I now know why. -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel
Re: [fpc-devel] simple UTF tests
Michael Schnell schrieb: On 01/09/2012 11:09 AM, Marco van de Voort wrote: Whatever the name is, in all current Unicode Delphi versions and FPC ansistring means 8-bit string exclusively. OK so the definition of AnsiString(CP_UTF16) in FPC and AnsiString(1200) in Delphi means Nothing sane :-( DoDi ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel