Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Michael Schnell

On 04/13/2016 01:41 PM, Mattias Gaertner wrote:

Why do you think that?
If that would be the way the function StringCodePage would need to check 
if the encodeing imposed on the string (e.g. a String constant) (i.e. 
the final meaning of CP_ACP) would be the same as the value of the 
variable DefaultSystemCodePage which has been set at this start of the 
program.


I looked at the code of StringCodePage, and it does not do so.

-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Mattias Gaertner
On Wed, 13 Apr 2016 12:53:49 +0200
Michael Schnell  wrote:

> On 04/13/2016 11:08 AM, Sven Barth wrote:
> >
> > The code pages that are relevant here are only single byte code pages 
> > (e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store 
> > UTF-16 data.
> >
> 
> StringCodePage(s) with an unqualified String return 0 (which is 
> "CP_ACP", and seemingly means "Default").

It means DefaultSystemCodePage.

> But how to determine what encoding this default is, if (as we found) it 
> can't be DefaultSystemCodePage and can be UTF16,

Why do you think that?

> which is dynamic, while 
> the default for unqualified strings is static and *never* UTF-16 ?


Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Michael Schnell

On 04/13/2016 11:08 AM, Sven Barth wrote:


The code pages that are relevant here are only single byte code pages 
(e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store 
UTF-16 data.




StringCodePage(s) with an unqualified String return 0 (which is 
"CP_ACP", and seemingly means "Default").


But how to determine what encoding this default is, if (as we found) it 
can't be DefaultSystemCodePage and can be UTF16, which is dynamic, while 
the default for unqualified strings is static and *never* UTF-16 ?


Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Michael Schnell

On 04/13/2016 11:08 AM, Sven Barth wrote:


The code pages that are relevant here are only single byte code pages 
(e.g. CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store 
UTF-16 data.


I see. And using 8 bit encoding as the brand for not the explicitly 
user-defind "String" type does make sense for a compiler that is 
supposed to create executables for multiple OSes does make prefect sense.


But AFAIK, Unicode aware Delphi uses UTF16 for the not the explicitly 
user-defind "String" type. (While AFAIK, not Unicode aware Delphi can't 
handle UTF8.)


So this is not compatible to either of them. In fact I am not asking for 
such compatibility (let alone that the two Delphi variants are even more 
incompatible to each other), but we need to be aware of the issue.


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Sven Barth
Am 13.04.2016 10:19 schrieb "Michael Schnell" :
>
> BTW. according to the said wiki page (at the end of the page) I am wrong
assuming that DefaultSystemCodePage is a constant introduced by the
compiler.
>
> Now I still don't know whether/how the default encoding for the type
"String (which is different from DefaultSystemCodePage according to the
wiki) is depending on the arch/OS the compiler is built for. (I only tested
on Linux and here it rather obviously is UTF8. I assume on Windows it's
UTF16 for Delphi compatibility).
>

The code pages that are relevant here are only single byte code pages (e.g.
CP1252) or UTF-8, *never* UTF-16 as a AnsiString can not store UTF-16 data.

And the DefaultSystemCodePage is determined upon startup from the system.

Regards,
Sven
--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Mattias Gaertner
On Wed, 13 Apr 2016 09:59:33 +0200
Michael Schnell  wrote:

>[...]
> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus

Good page.


> Now I did some test and asked ob that list and found that the default 
> code page for the type "String" and with that the definition of  of 
> TStrings and with that the working of TStringList.Add and friends 
> depends on the setting of "DefaultSystemcodepage".

The "type" of String does not change.

> I understand that 
> DefaultSystemcodepage is set when compiling the compiler (e.g. to UTF-8 
> on Linux and (supposedly) to UTF-16 in Windows).

No. DefaultSystemcodepage is set at runtime. By the RTL and the
widestring manager.
Lazarus unit LazUTF8 sets it to CP_UTF8.
 
Mattias

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Graeme Geldenhuys
On 2016-04-13 09:19, Michael Schnell wrote:
> I assume on 
> Windows it's UTF16 for Delphi compatibility

No, that is not correct, at least for {$mode objfpc}. Maybe {$mode
delphi} is different, I'm not sure. The system code page varies based on
locale settings. I just went through this exercise with fcl-pdf.

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


Re: [Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Michael Schnell
BTW. according to the said wiki page (at the end of the page) I am wrong 
assuming that DefaultSystemCodePage is a constant introduced by the 
compiler.


Now I still don't know whether/how the default encoding for the type 
"String (which is different from DefaultSystemCodePage according to the 
wiki) is depending on the arch/OS the compiler is built for. (I only 
tested on Linux and here it rather obviously is UTF8. I assume on 
Windows it's UTF16 for Delphi compatibility).


-Michael

--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus


[Lazarus] wiki page "Better_Unicode_Support_in_Lazarus"

2016-04-13 Thread Michael Schnell
There was a discussion in the fpc -pascal mailing list about a question  
a user (tobiasgiesen) asked (among other things) about storing strings 
of a certain encoding brand in a TStringList.


Here Juha recommended to read this page -> 
http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Now I did some test and asked ob that list and found that the default 
code page for the type "String" and with that the definition of  of 
TStrings and with that the working of TStringList.Add and friends 
depends on the setting of "DefaultSystemcodepage". I understand that 
DefaultSystemcodepage is set when compiling the compiler (e.g. to UTF-8 
on Linux and (supposedly) to UTF-16 in Windows).


So I suppose Lazarus can't do anything about that. Or are there plans to 
use different compilers /RTL variant to allow for 
"Better_Unicode_Support_in_Lazarus" (project depending potentially 
better performance  or better portability and Linux, explicitly using e.g. native pos() results instead of 
introducing CodePointPos(), ...)   ?


-Michael



--
___
Lazarus mailing list
Lazarus@lists.lazarus.freepascal.org
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus