Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On 04/13/2016 01:41 PM, Mattias Gaertner wrote: Why do you think that? If that would be the way the function StringCodePage would need to check if the encodeing imposed on the string (e.g. a String constant) (i.e. the final meaning of CP_ACP) would be the same as the value of the variable DefaultSystemCodePage which has been set at this start of the program. I looked at the code of StringCodePage, and it does not do so. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On Wed, 13 Apr 2016 12:53:49 +0200 Michael Schnellwrote: > On 04/13/2016 11:08 AM, Sven Barth wrote: > > > > The code pages that are relevant here are only single byte code pages > > (e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store > > UTF-16 data. > > > > StringCodePage(s) with an unqualified String return 0 (which is > "CP_ACP", and seemingly means "Default"). It means DefaultSystemCodePage. > But how to determine what encoding this default is, if (as we found) it > can't be DefaultSystemCodePage and can be UTF16, Why do you think that? > which is dynamic, while > the default for unqualified strings is static and *never* UTF-16 ? Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On 04/13/2016 11:08 AM, Sven Barth wrote: The code pages that are relevant here are only single byte code pages (e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store UTF-16 data. StringCodePage(s) with an unqualified String return 0 (which is "CP_ACP", and seemingly means "Default"). But how to determine what encoding this default is, if (as we found) it can't be DefaultSystemCodePage and can be UTF16, which is dynamic, while the default for unqualified strings is static and *never* UTF-16 ? Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On 04/13/2016 11:08 AM, Sven Barth wrote: The code pages that are relevant here are only single byte code pages (e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store UTF-16 data. I see. And using 8 bit encoding as the brand for not the explicitly user-defind "String" type does make sense for a compiler that is supposed to create executables for multiple OSes does make prefect sense. But AFAIK, Unicode aware Delphi uses UTF16 for the not the explicitly user-defind "String" type. (While AFAIK, not Unicode aware Delphi can't handle UTF8.) So this is not compatible to either of them. In fact I am not asking for such compatibility (let alone that the two Delphi variants are even more incompatible to each other), but we need to be aware of the issue. -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
Am 13.04.2016 10:19 schrieb "Michael Schnell": > > BTW. according to the said wiki page (at the end of the page) I am wrong assuming that DefaultSystemCodePage is a constant introduced by the compiler. > > Now I still don't know whether/how the default encoding for the type "String (which is different from DefaultSystemCodePage according to the wiki) is depending on the arch/OS the compiler is built for. (I only tested on Linux and here it rather obviously is UTF8. I assume on Windows it's UTF16 for Delphi compatibility). > The code pages that are relevant here are only single byte code pages (e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store UTF-16 data. And the DefaultSystemCodePage is determined upon startup from the system. Regards, Sven -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On Wed, 13 Apr 2016 09:59:33 +0200 Michael Schnellwrote: >[...] > http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Good page. > Now I did some test and asked ob that list and found that the default > code page for the type "String" and with that the definition of of > TStrings and with that the working of TStringList.Add and friends > depends on the setting of "DefaultSystemcodepage". The "type" of String does not change. > I understand that > DefaultSystemcodepage is set when compiling the compiler (e.g. to UTF-8 > on Linux and (supposedly) to UTF-16 in Windows). No. DefaultSystemcodepage is set at runtime. By the RTL and the widestring manager. Lazarus unit LazUTF8 sets it to CP_UTF8. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
On 2016-04-13 09:19, Michael Schnell wrote: > I assume on > Windows it's UTF16 for Delphi compatibility No, that is not correct, at least for {$mode objfpc}. Maybe {$mode delphi} is different, I'm not sure. The system code page varies based on locale settings. I just went through this exercise with fcl-pdf. Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
BTW. according to the said wiki page (at the end of the page) I am wrong assuming that DefaultSystemCodePage is a constant introduced by the compiler. Now I still don't know whether/how the default encoding for the type "String (which is different from DefaultSystemCodePage according to the wiki) is depending on the arch/OS the compiler is built for. (I only tested on Linux and here it rather obviously is UTF8. I assume on Windows it's UTF16 for Delphi compatibility). -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
[Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"
There was a discussion in the fpc -pascal mailing list about a question a user (tobiasgiesen) asked (among other things) about storing strings of a certain encoding brand in a TStringList. Here Juha recommended to read this page -> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Now I did some test and asked ob that list and found that the default code page for the type "String" and with that the definition of of TStrings and with that the working of TStringList.Add and friends depends on the setting of "DefaultSystemcodepage". I understand that DefaultSystemcodepage is set when compiling the compiler (e.g. to UTF-8 on Linux and (supposedly) to UTF-16 in Windows). So I suppose Lazarus can't do anything about that. Or are there plans to use different compilers /RTL variant to allow for "Better_Unicode_Support_in_Lazarus" (project depending potentially better performance or better portability and Linux, explicitly using e.g. native pos() results instead of introducing CodePointPos(), ...) ? -Michael -- ___ Lazarus mailing list Lazarus@lists.lazarus.freepascal.org http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus