Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Seth Grover
> Not necessarily:
> if you are dealing with UTF8/UnicodeString and other codepages, it is
> quite likely, and preferred, that you have unit cwstring included.
> Michael.

Question about this: If I use cthreads, cwstring, and load my own memory
manager, what should the order in the uses clause be? I know if they are
loaded/unloaded in the wrong order you can get SIGSEGVs during shutdown.

>From my experience cthreads needs to be first. So should it be:

cthreads, my_memory_manager, cwstring

or

cthreads, cwstring, my_memory_manager?

Thanks,

-SG

--
Seth Grover
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Graeme Geldenhuys wrote:


On 2016-04-04 10:43, Michael Van Canneyt wrote:


No. It says 'typically'.

That doesn't necessarily mean it is so, but is mostly correct.


It is still a very vague assumption. As I mentioned in my previous
reply, that statement is false on my FreeBSD system too. So I should
read the wiki as "typically incorrect" ;-)



No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ.
I know people that still use ISO-8895 (or something similar). Only the
programmer can know what is correct.


Just curious, how do you change the default codepage on a Linux system?
By exporting a new value to the LANG environment variable?


Yes. And various LC_ environment vars.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:58, Jonas Maebe wrote:
> I have now updated the page to reflect this  
> fact.


Thanks Jonas, that's an important point to note.

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:43, Michael Van Canneyt wrote:
> 
> No. It says 'typically'.
> 
> That doesn't necessarily mean it is so, but is mostly correct.

It is still a very vague assumption. As I mentioned in my previous
reply, that statement is false on my FreeBSD system too. So I should
read the wiki as "typically incorrect" ;-)


> No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. 
> I know people that still use ISO-8895 (or something similar). Only the
> programmer can know what is correct.

Just curious, how do you change the default codepage on a Linux system?
By exporting a new value to the LANG environment variable?



Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:39, tobiasgie...@gmail.com wrote:
> It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
> RTL uses 
> here by default CP_UTF8."
> 
> Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0.

So the "typically" seems more like "unlikely in most cases".

Regards,
  - Graeme -

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Jonas Maebe wrote:



Michael Van Canneyt wrote on Mon, 04 Apr 2016:

Don't be too hasty in changing things. Jonas (who created that page) is 
usually very careful in such matters.


I did forget to mention that the Default*CodePage variables are only 
initialised with the "real" values on *nix platforms if you include a 
widestring manager unit. I have now updated the page to reflect this fact.


I will update the docs with such info for the upcoming 3.0.2.
It clearly needs mentioning...

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe


Michael Van Canneyt wrote on Mon, 04 Apr 2016:

Don't be too hasty in changing things. Jonas (who created that page)  
is usually very careful in such matters.


I did forget to mention that the Default*CodePage variables are only  
initialised with the "real" values on *nix platforms if you include a  
widestring manager unit. I have now updated the page to reflect this  
fact.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe


tobiasgiesen wrote on Mon, 04 Apr 2016:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system  
codepage, so the RTL uses

here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.


That's because you don't use the cwstring unit nor another widestring  
manager. In that case, the DefaultSystemCodePage is set to ASCII  
because the default string conversion routines on Unix platforms do  
not support anything else (neither for converting from  
ansi/shortstring to wide/unicodestring, nor between ansistrings using  
different codepages). This is not new with FPC 3.0, it has always been  
like that.



Jonas
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:




On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?


No. It says 'typically'.


That's the first part of the sentence.

The second part says: "the RTL uses here by default CP_UTF8."

That is wrong. It does not (at least not on Mac). This must be fixed.


Why do you think it is wrong ? 
It is perhaps wrong on your computer. Maybe it checks the environment ?

Did you include clocale ? etc. There are maybe a 100 things that can change
this.

Don't be too hasty in changing things. 
Jonas (who created that page) is usually very careful in such matters.


Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 11:35:53 +0200 (CEST)
Michael Van Canneyt  wrote:

>[...]
> >> Then no conversions will be done for all ansistrings that contain UTF8.
> >
> > And this really means AnsiString, not AnsiString(something).
> 
> The latter cannot contain UTF8 unless you do some really nasty tricks... :-)

UTF8String is type AnsiString(CP_UTF8) and if you mix that with
AnsiString the compiler adds conversions code, because at compile time
CP_ACP is not UTF-8.
These kind of traps confuse people.

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> 
> 
> On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:
> 
> >> Your question was not about Lazarus but maybe you should read this:
> >>   http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
> >
> > Very interesting, but apparently there is some wrong info.
> >
> > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so 
> > the RTL uses
> > here by default CP_UTF8."
> >
> > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.
> >
> > Should I fix this bit on the documentation page?
> 
> No. It says 'typically'.

That's the first part of the sentence.

The second part says: "the RTL uses here by default CP_UTF8."

That is wrong. It does not (at least not on Mac). This must be fixed.

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote:


Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus


Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?


No. It says 'typically'.

That doesn't necessarily mean it is so, but is mostly correct.



Apparently the LCL assumes that FCL sets it to UTF-8, but it does not.

I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux.


No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. 
I know people that still use ISO-8895 (or something similar). Only the

programmer can know what is correct.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> Your question was not about Lazarus but maybe you should read this:
>   http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus

Very interesting, but apparently there is some wrong info.

It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the 
RTL uses 
here by default CP_UTF8."

Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001.

Should I fix this bit on the documentation page?

Apparently the LCL assumes that FCL sets it to UTF-8, but it does not.

I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux.

Cheers,
Tobias

___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Mattias Gaertner wrote:


On Mon, 4 Apr 2016 10:32:58 +0200 (CEST)
Michael Van Canneyt  wrote:


[...]
You cannot, but you can set DefaultSystemCodePage to CP_UTF8.


I think it is important to note how to do this properly:

 SetMultiByteConversionCodePage(CP_UTF8);
 SetMultiByteRTLFileSystemCodePage(CP_UTF8);

You should add these lines in an early initialization section. The
beginning of your program might be too late.


Then no conversions will be done for all ansistrings that contain UTF8.


And this really means AnsiString, not AnsiString(something).


The latter cannot contain UTF8 unless you do some really nasty tricks... :-)

Michael
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:32:58 +0200 (CEST)
Michael Van Canneyt  wrote:

>[...]
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8.

I think it is important to note how to do this properly:

  SetMultiByteConversionCodePage(CP_UTF8);
  SetMultiByteRTLFileSystemCodePage(CP_UTF8);

You should add these lines in an early initialization section. The
beginning of your program might be too late.

> Then no conversions will be done for all ansistrings that contain UTF8.

And this really means AnsiString, not AnsiString(something).

Mattias
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:36 AM,   wrote:
> Sorry, I was not able to come to that conclusion from the existing docs.

Your question was not about Lazarus but maybe you should read this:
  http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus
It works also without LCL.
The bottom line is: remove all explicit conversion functions.

Juha
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8.
> Then no conversions will be done for all ansistrings that contain UTF8.

Fantastic. Many thanks. That fixes my problem entirely (I think).

Sorry, I was not able to come to that conclusion from the existing docs.

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt



On Mon, 4 Apr 2016, Tobias Giesen wrote:


Hello,

my application uses the AnsiString type to store UTF-8 data. That was
totally fine. Now in FPC 3, automatic conversions cause data loss. I
get question marks replacing Chinese characters, for example.

I do not fully understand at which points these conversions are done.
The FPC 3 Unicode documentation says something about "passing it to a
RTL routine".

What about this code:
var a,b:Ansistring;
begin
 a:=Utf8Encode(AWideString);
 b:=Copy(a,1,10);
 end;

Is "Copy" an RTL routine? Is this OK or not?

Best for me would be to be able to turn the conversions off completely.


You cannot, but you can set DefaultSystemCodePage to CP_UTF8.
Then no conversions will be done for all ansistrings that contain UTF8.

Michael.
___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal


[fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Tobias Giesen
Hello,

my application uses the AnsiString type to store UTF-8 data. That was
totally fine. Now in FPC 3, automatic conversions cause data loss. I
get question marks replacing Chinese characters, for example.

I do not fully understand at which points these conversions are done.
The FPC 3 Unicode documentation says something about "passing it to a
RTL routine".

What about this code:
var a,b:Ansistring;
begin
  a:=Utf8Encode(AWideString);
  b:=Copy(a,1,10);
  end;

Is "Copy" an RTL routine? Is this OK or not?

Best for me would be to be able to turn the conversions off completely.
Is that possible?

Cheers,
Tobias


___
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal