Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> Not necessarily: > if you are dealing with UTF8/UnicodeString and other codepages, it is > quite likely, and preferred, that you have unit cwstring included. > Michael. Question about this: If I use cthreads, cwstring, and load my own memory manager, what should the order in the uses clause be? I know if they are loaded/unloaded in the wrong order you can get SIGSEGVs during shutdown. >From my experience cthreads needs to be first. So should it be: cthreads, my_memory_manager, cwstring or cthreads, cwstring, my_memory_manager? Thanks, -SG -- Seth Grover ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 10:43, Michael Van Canneyt wrote: No. It says 'typically'. That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my FreeBSD system too. So I should read the wiki as "typically incorrect" ;-) No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. I know people that still use ISO-8895 (or something similar). Only the programmer can know what is correct. Just curious, how do you change the default codepage on a Linux system? By exporting a new value to the LANG environment variable? Yes. And various LC_ environment vars. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:58, Jonas Maebe wrote: > I have now updated the page to reflect this > fact. Thanks Jonas, that's an important point to note. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:43, Michael Van Canneyt wrote: > > No. It says 'typically'. > > That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my FreeBSD system too. So I should read the wiki as "typically incorrect" ;-) > No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. > I know people that still use ISO-8895 (or something similar). Only the > programmer can know what is correct. Just curious, how do you change the default codepage on a Linux system? By exporting a new value to the LANG environment variable? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:39, tobiasgie...@gmail.com wrote: > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the > RTL uses > here by default CP_UTF8." > > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0. So the "typically" seems more like "unlikely in most cases". Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the "real" values on *nix platforms if you include a widestring manager unit. I have now updated the page to reflect this fact. I will update the docs with such info for the upcoming 3.0.2. It clearly needs mentioning... Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the "real" values on *nix platforms if you include a widestring manager unit. I have now updated the page to reflect this fact. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
tobiasgiesen wrote on Mon, 04 Apr 2016: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. That's because you don't use the cwstring unit nor another widestring manager. In that case, the DefaultSystemCodePage is set to ASCII because the default string conversion routines on Unix platforms do not support anything else (neither for converting from ansi/shortstring to wide/unicodestring, nor between ansistrings using different codepages). This is not new with FPC 3.0, it has always been like that. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? No. It says 'typically'. That's the first part of the sentence. The second part says: "the RTL uses here by default CP_UTF8." That is wrong. It does not (at least not on Mac). This must be fixed. Why do you think it is wrong ? It is perhaps wrong on your computer. Maybe it checks the environment ? Did you include clocale ? etc. There are maybe a 100 things that can change this. Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016 11:35:53 +0200 (CEST) Michael Van Canneyt wrote: >[...] > >> Then no conversions will be done for all ansistrings that contain UTF8. > > > > And this really means AnsiString, not AnsiString(something). > > The latter cannot contain UTF8 unless you do some really nasty tricks... :-) UTF8String is type AnsiString(CP_UTF8) and if you mix that with AnsiString the compiler adds conversions code, because at compile time CP_ACP is not UTF-8. These kind of traps confuse people. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> > > On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: > > >> Your question was not about Lazarus but maybe you should read this: > >> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus > > > > Very interesting, but apparently there is some wrong info. > > > > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so > > the RTL uses > > here by default CP_UTF8." > > > > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. > > > > Should I fix this bit on the documentation page? > > No. It says 'typically'. That's the first part of the sentence. The second part says: "the RTL uses here by default CP_UTF8." That is wrong. It does not (at least not on Mac). This must be fixed. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? No. It says 'typically'. That doesn't necessarily mean it is so, but is mostly correct. Apparently the LCL assumes that FCL sets it to UTF-8, but it does not. I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux. No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. I know people that still use ISO-8895 (or something similar). Only the programmer can know what is correct. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> Your question was not about Lazarus but maybe you should read this: > http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? Apparently the LCL assumes that FCL sets it to UTF-8, but it does not. I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Mattias Gaertner wrote: On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneyt wrote: [...] You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly: SetMultiByteConversionCodePage(CP_UTF8); SetMultiByteRTLFileSystemCodePage(CP_UTF8); You should add these lines in an early initialization section. The beginning of your program might be too late. Then no conversions will be done for all ansistrings that contain UTF8. And this really means AnsiString, not AnsiString(something). The latter cannot contain UTF8 unless you do some really nasty tricks... :-) Michael ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneyt wrote: >[...] > You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly: SetMultiByteConversionCodePage(CP_UTF8); SetMultiByteRTLFileSystemCodePage(CP_UTF8); You should add these lines in an early initialization section. The beginning of your program might be too late. > Then no conversions will be done for all ansistrings that contain UTF8. And this really means AnsiString, not AnsiString(something). Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, Apr 4, 2016 at 11:36 AM, wrote: > Sorry, I was not able to come to that conclusion from the existing docs. Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus It works also without LCL. The bottom line is: remove all explicit conversion functions. Juha ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8. > Then no conversions will be done for all ansistrings that contain UTF8. Fantastic. Many thanks. That fixes my problem entirely (I think). Sorry, I was not able to come to that conclusion from the existing docs. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Tobias Giesen wrote: Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points these conversions are done. The FPC 3 Unicode documentation says something about "passing it to a RTL routine". What about this code: var a,b:Ansistring; begin a:=Utf8Encode(AWideString); b:=Copy(a,1,10); end; Is "Copy" an RTL routine? Is this OK or not? Best for me would be to be able to turn the conversions off completely. You cannot, but you can set DefaultSystemCodePage to CP_UTF8. Then no conversions will be done for all ansistrings that contain UTF8. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points these conversions are done. The FPC 3 Unicode documentation says something about "passing it to a RTL routine". What about this code: var a,b:Ansistring; begin a:=Utf8Encode(AWideString); b:=Copy(a,1,10); end; Is "Copy" an RTL routine? Is this OK or not? Best for me would be to be able to turn the conversions off completely. Is that possible? Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal