Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
On 04/04/2016 06:23 AM, tobiasgie...@gmail.com wrote: OK, I just confirmed. Adding clocale to my 5-line test program doesn't affect the DefaultSystemCodePage result, but as soon as I add cwstring to the uses clause, then DefaultSystemCodePage returns 65001. On Mac, not even cwstring does that. It sets the DefaultSystemCodePage to 20127. So, on Mac, the DefaultSystemCodePage is not "typically" set to UTF_8. It is never set to UTF_8 unless you do it yourself. FWIW: i keep seeing the argument "on Mac" but never has the OS on that Mac been mentioned... AFAIK there is more than one OS for Mac or at least more than one version of the OS... it is possible that the default has been changed plus there's whatever was selected for the language during the installation... this really should be clarified for your Mac and its OS... -- NOTE: No off-list assistance is given without prior approval. *Please keep mailing list traffic on the list* unless private contact is specifically requested and granted. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> Not necessarily: > if you are dealing with UTF8/UnicodeString and other codepages, it is > quite likely, and preferred, that you have unit cwstring included. > Michael. Question about this: If I use cthreads, cwstring, and load my own memory manager, what should the order in the uses clause be? I know if they are loaded/unloaded in the wrong order you can get SIGSEGVs during shutdown. >From my experience cthreads needs to be first. So should it be: cthreads, my_memory_manager, cwstring or cthreads, cwstring, my_memory_manager? Thanks, -SG -- Seth Grover ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
tobiasgiesen wrote on Mon, 04 Apr 2016: > Terminal has LC_CTYPE=UTF-8. What about LC_ALL? My Mac OS installations do not have LC_ALL. But I just noticed that Carbon GUI programs do not get LC_CTYPE in their environment either. If none of the environment variables related to code pages are set, FPC falls back to UTF-8 for (a.o.) OS X. So maybe cwstring needs to be fixed for Carbon GUI Mac OS X programs. If you get ASCII, it means that one of the LC_ALL, LC_CTYPE and/or LANG environment variables is set to a setting that corresponds to ASCII (such as "C"), or set to a value that is not recognised as or translatable into a Windows code page number. What I see in the environment is __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0 I think Carbon apps should override DefaultSytemCodePage, because the Carbon interfaces always use UTF-8, they do not care about any environment strings. On OS X, unlike on Windows, there is no inherent difference between "GUI" (be it Carbon, Cocoa, or --most likely-- a mixture of the two) and "non-GUI" applications. You can have command line applications linking to a Carbon framework to deal with aliases, and a GUI application calling into libutil to open a pseudo tty. The above environment variable is also unrelated to Carbon, but comes from CoreFoundation. 0x1F5 is the hexadecimal value of your user ID. At least one of the 0x0's indeed refers to the default/ansi encoding of CoreFoundation, but it's definitely not the value you want to use. It's the value of the MacRoman text encoding. That said, FPC 3.1.1 also contains an OS X/iOS-specific widestring manager unit that you can use instead of cwstring (iosxwstr), and that one will always default to UTF-8 (because the "ansi" code page of CoreFoundation only makes sense from a classic Mac backward compatibility standpoint, which we don't have to care about because we don't have a legacy code base that depends on this default setting -- if someone would want to port code that depends on this to FPC, they would have to set this themselves). Jonas Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 13:15, Sven Barth wrote: > Qt uses UTF-16 as well... I always thought that strange. After all, Qt was born as a Unix-type GUI toolkit. Unless I got my facts wrong. Then again, it's only in recent years that Unix-like systems moved to UTF-8. I think even FreeBSD didn't use UTF-8 out of the box (until the last one or two releases). Java also uses UTF-16. What I like about those two are that they only have one string type. No confusion, but then they don't have as long a history as Pascal, so no legacy code to worry about. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
Am 04.04.2016 13:21 schrieb "Graeme Geldenhuys" < mailingli...@geldenhuys.co.uk>: > > On 2016-04-04 12:06, Michael Van Canneyt wrote: > > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > > On Windows, UnicodeString is more 'natural' or 'native'. > > Based on Internet standards and most popular OSes (mobile devices > included), UTF-8 is kind - so we all know Windows backed the wrong horse > [encoding]. ;-) > >[...Graeme runs and hides...] > Qt uses UTF-16 as well... (and our company's OS uses UTF-32) Regards, Sven ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: [add LCL UTF-8 helper units to FPC] Though it could probably be added as quick as in FPC 3.0.2. It's simply two new units that need to be explicitly used by somebody to have any affect, so it will not break existing code otherwise [if not used]. They should at least be renamed, to avoid confusion. Other than that, I personally see no objections. I do: it's more units that we have to maintain, process bug reports and feature requests for, etc (or, in case they are supposed to remain copies of the Lazarus units, then it's extra work keeping them in sync and given the non-synchronised release cycles, they will almost never be in sync). We already have plenty of work with our own code. And that is why I wrote 'personally' :-) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
> > Terminal has LC_CTYPE=UTF-8. > > What about LC_ALL? My Mac OS installations do not have LC_ALL. But I just noticed that Carbon GUI programs do not get LC_CTYPE in their environment either. So maybe cwstring needs to be fixed for Carbon GUI Mac OS X programs. What I see in the environment is __CF_USER_TEXT_ENCODING=0x1F5:0x0:0x0 I think Carbon apps should override DefaultSytemCodePage, because the Carbon interfaces always use UTF-8, they do not care about any environment strings. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 12:06, Michael Van Canneyt wrote: 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile devices included), UTF-8 is kind - so we all know Windows backed the wrong horse [encoding]. ;-) [...Graeme runs and hides...] Well, in 2016, I still only use UTF-8, even on windows. It works without problems if you know what you're doing. 2. The release cycle of FPC is rather long, so updates will be available not as fast as the lazarus team needs them. That's a valid point. Though it could probably be added as quick as in FPC 3.0.2. It's simply two new units that need to be explicitly used by somebody to have any affect, so it will not break existing code otherwise [if not used]. They should at least be renamed, to avoid confusion. Other than that, I personally see no objections. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 12:06, Michael Van Canneyt wrote: > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile devices included), UTF-8 is kind - so we all know Windows backed the wrong horse [encoding]. ;-) [...Graeme runs and hides...] > 2. The release cycle of FPC is rather long, so updates will be available not > as fast as the lazarus team needs them. That's a valid point. Though it could probably be added as quick as in FPC 3.0.2. It's simply two new units that need to be explicitly used by somebody to have any affect, so it will not break existing code otherwise [if not used]. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
tobiasgiesen wrote on Mon, 04 Apr 2016: How did you get a codepage 20127 Mac? The Mac is UTF-8, This statement makes no sense. There is no "UTF-8" or "non-UTF-8" Mac. The Unix environment on OS X can use any OS-supported code page. but cwstring or whatever does not realize it. cwstring simply sets the code page based on how it is defined in your environment. Terminal has LC_CTYPE=UTF-8. What about LC_ALL? LC_ALL overrides LC_CTYPE, because that is how the meaning of these environment variables is defined by POSIX (see e.g. http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html ) Well I will just set the default codepages manually. Then you will probably be back in a few months with another message complaining how FPC 3.0 supposedly breaks X or Y, because you are merely hiding the real issue (e.g. when calling an external program, which may then try to interpret your UTF-8 command line arguments as plain ASCII). Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: more complete solution for UTF-8. This is useful for many users. They don't have to reinvent the wheel. Not having looked at the two units you mentioned... but if this is a general requirement for anybody using UTF-8 or similar with FPC 3.0, then wouldn't it be best to see if those units can be contributed to FPC's FCL? The ultimate "don't reinvent the wheel" location. ;-) One would think so but: 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. On Windows, UnicodeString is more 'natural' or 'native'. 2. The release cycle of FPC is rather long, so updates will be available not as fast as the lazarus team needs them. And in view of 1. that may be a problem. If memory serves well, there was initially an attempt to get some of the functionality into FPC by Felipe, but this was quickly abandoned due to above arguments... Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 11:34, Mattias Gaertner wrote: > for that. In fact you don't have to use LazUtils: some users simply > copied the two units FPCAdds and LazUTF8. It's all open source. This was not made clear until you explicitly mentioned it. Juha's initial comment was vague on the matter, and the original poster never mentioned they used Lazarus or LCL. > Second I find it funny that the statement comes from you I simply wanted an answer or explanation that benefits anybody using FPC. > more complete solution for UTF-8. This is useful for many users. They > don't have to reinvent the wheel. Not having looked at the two units you mentioned... but if this is a general requirement for anybody using UTF-8 or similar with FPC 3.0, then wouldn't it be best to see if those units can be contributed to FPC's FCL? The ultimate "don't reinvent the wheel" location. ;-) Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
> How did you get a codepage 20127 Mac? The Mac is UTF-8, but cwstring or whatever does not realize it. Since I cannot easily step into it with the debugger, I can't tell you why. Terminal has LC_CTYPE=UTF-8. Well I will just set the default codepages manually. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
On Mon, 4 Apr 2016 11:34:18 +0100 Graeme Geldenhuyswrote: > On 2016-04-04 11:23, tobiasgie...@gmail.com wrote: > > On Mac, not even cwstring does that. It sets the DefaultSystemCodePage > > to 20127. > > I just installed FPC 3.0 on my Macbook Pro (bought in the UK) and did > the same test. Here DefaultSystemCodePage returned 65001. So I guess it > depends on your OSX installation and which default locale settings was > set up during install. All my Macs since 10.4 had UTF-8 as default and I can't remember a setting during install to change it. How did you get a codepage 20127 Mac? Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 11:40, Mattias Gaertner wrote: > Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from > here: Thank you Juha and Mattias - I'll take a look at those to see what they do. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016 13:27:05 +0300 Juha Manninenwrote: >[...] > But yes, it requires Lazarus IDE because LazUtils is a Lazarus > package. At least you must create and compile the project using > Lazarus IDE. Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from here: http://svn.freepascal.org/svn/lazarus/tags/lazarus_1_6/components/lazutils/ Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
On 2016-04-04 11:23, tobiasgie...@gmail.com wrote: > On Mac, not even cwstring does that. It sets the DefaultSystemCodePage > to 20127. I just installed FPC 3.0 on my Macbook Pro (bought in the UK) and did the same test. Here DefaultSystemCodePage returned 65001. So I guess it depends on your OSX installation and which default locale settings was set up during install. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016 10:52:20 +0100 Graeme Geldenhuyswrote: > On 2016-04-04 10:27, Juha Manninen wrote: > > Just use the new UTF-8 mode provided by Lazarus and remove all > > explicit conversion functions. > > This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so > making such a suggestion is wishful thinking. For example, your > suggestion means nothing to me, I don't use LCL. First of all it's part of LazUtils. So you don't have to use the LCL for that. In fact you don't have to use LazUtils: some users simply copied the two units FPCAdds and LazUTF8. It's all open source. Second I find it funny that the statement comes from you - a notorious promoter of software on forums/lists of competing projects. And third setting the DefaultSystemCodePage is a good start, but not enough. Instead of explaining all the gory details, Juha promoted a more complete solution for UTF-8. This is useful for many users. They don't have to reinvent the wheel. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 10:43, Michael Van Canneyt wrote: No. It says 'typically'. That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my FreeBSD system too. So I should read the wiki as "typically incorrect" ;-) No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. I know people that still use ISO-8895 (or something similar). Only the programmer can know what is correct. Just curious, how do you change the default codepage on a Linux system? By exporting a new value to the LANG environment variable? Yes. And various LC_ environment vars. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, Apr 4, 2016 at 12:52 PM, Graeme Geldenhuyswrote: > This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so > making such a suggestion is wishful thinking. For example, your > suggestion means nothing to me, I don't use LCL. Yes, I should have mentioned that this feature does not require LCL. It only requires LazUtils package and LazUTF8 unit in your uses section. It can be used in cmd line and server programs and I guess in fpGUI, too, although I have not tested. But yes, it requires Lazarus IDE because LazUtils is a Lazarus package. At least you must create and compile the project using Lazarus IDE. Anyway, this UTF-8 mode does more that sets the default String encoding. It also provides proper UTF-8 functions as backends for RTL's Ansi...() string functions. It also uses cwstring although it pulls in clib. Then typical users' code is amazingly Delphi compatible despite the different encoding, because code only seldom deals with individual codepoints beyond 7-bit ASCII. Juha ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
> OK, I just confirmed. Adding clocale to my 5-line test program doesn't > affect the DefaultSystemCodePage result, but as soon as I add cwstring > to the uses clause, then DefaultSystemCodePage returns 65001. On Mac, not even cwstring does that. It sets the DefaultSystemCodePage to 20127. So, on Mac, the DefaultSystemCodePage is not "typically" set to UTF_8. It is never set to UTF_8 unless you do it yourself. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:58, Jonas Maebe wrote: > I have now updated the page to reflect this > fact. Thanks Jonas, that's an important point to note. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
On 2016-04-04 11:07, Michael Van Canneyt wrote: > if you are dealing with UTF8/UnicodeString and other codepages, it is > quite likely, and preferred, that you have unit cwstring included. OK, I just confirmed. Adding clocale to my 5-line test program doesn't affect the DefaultSystemCodePage result, but as soon as I add cwstring to the uses clause, then DefaultSystemCodePage returns 65001. I never use the UnicodeString type, and was under the impression that only UnicodeString (and WideString) requires the cwstring unit to function correctly. I use UTF-8 encoded text everywhere (stored inside String), so never bothered with the cwstring unit. But clearly in FPC 3.0 it is vital to include cwstring if you do anything text related. Something funny: I thought it really funny when Delphi introduced Unicode support, and how everybody struggled. I thought: "what a mess". Now FPC developers seem to be in the same boat. :-/ Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:43, Michael Van Canneyt wrote: > > No. It says 'typically'. > > That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my FreeBSD system too. So I should read the wiki as "typically incorrect" ;-) > No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. > I know people that still use ISO-8895 (or something similar). Only the > programmer can know what is correct. Just curious, how do you change the default codepage on a Linux system? By exporting a new value to the LANG environment variable? Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 10:39, tobiasgie...@gmail.com wrote: It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0. So the "typically" seems more like "unlikely in most cases". Not necessarily: if you are dealing with UTF8/UnicodeString and other codepages, it is quite likely, and preferred, that you have unit cwstring included. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On 2016-04-04 10:39, tobiasgie...@gmail.com wrote: > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the > RTL uses > here by default CP_UTF8." > > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Indeed, and on my FreeBSD system DefaultSystemCodePage returns 0. So the "typically" seems more like "unlikely in most cases". Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the "real" values on *nix platforms if you include a widestring manager unit. I have now updated the page to reflect this fact. I will update the docs with such info for the upcoming 3.0.2. It clearly needs mentioning... Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the "real" values on *nix platforms if you include a widestring manager unit. I have now updated the page to reflect this fact. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 10:27, Juha Manninen wrote: > Just use the new UTF-8 mode provided by Lazarus and remove all > explicit conversion functions. This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so making such a suggestion is wishful thinking. For example, your suggestion means nothing to me, I don't use LCL. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
tobiasgiesen wrote on Mon, 04 Apr 2016: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. That's because you don't use the cwstring unit nor another widestring manager. In that case, the DefaultSystemCodePage is set to ASCII because the default string conversion routines on Unix platforms do not support anything else (neither for converting from ansi/shortstring to wide/unicodestring, nor between ansistrings using different codepages). This is not new with FPC 3.0, it has always been like that. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? No. It says 'typically'. That's the first part of the sentence. The second part says: "the RTL uses here by default CP_UTF8." That is wrong. It does not (at least not on Mac). This must be fixed. Why do you think it is wrong ? It is perhaps wrong on your computer. Maybe it checks the environment ? Did you include clocale ? etc. There are maybe a 100 things that can change this. Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016 11:35:53 +0200 (CEST) Michael Van Canneytwrote: >[...] > >> Then no conversions will be done for all ansistrings that contain UTF8. > > > > And this really means AnsiString, not AnsiString(something). > > The latter cannot contain UTF8 unless you do some really nasty tricks... :-) UTF8String is type AnsiString(CP_UTF8) and if you mix that with AnsiString the compiler adds conversions code, because at compile time CP_ACP is not UTF-8. These kind of traps confuse people. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> > > On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: > > >> Your question was not about Lazarus but maybe you should read this: > >> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus > > > > Very interesting, but apparently there is some wrong info. > > > > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so > > the RTL uses > > here by default CP_UTF8." > > > > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. > > > > Should I fix this bit on the documentation page? > > No. It says 'typically'. That's the first part of the sentence. The second part says: "the RTL uses here by default CP_UTF8." That is wrong. It does not (at least not on Mac). This must be fixed. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? No. It says 'typically'. That doesn't necessarily mean it is so, but is mostly correct. Apparently the LCL assumes that FCL sets it to UTF-8, but it does not. I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux. No, that would be wrong. it normally is CP_UTF8 on linux. But it can differ. I know people that still use ISO-8895 (or something similar). Only the programmer can know what is correct. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> Your question was not about Lazarus but maybe you should read this: > http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Should I fix this bit on the documentation page? Apparently the LCL assumes that FCL sets it to UTF-8, but it does not. I think the LCL should explicitly set it to CP_UTF8 on Mac and Linux. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Mattias Gaertner wrote: On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneytwrote: [...] You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly: SetMultiByteConversionCodePage(CP_UTF8); SetMultiByteRTLFileSystemCodePage(CP_UTF8); You should add these lines in an early initialization section. The beginning of your program might be too late. Then no conversions will be done for all ansistrings that contain UTF8. And this really means AnsiString, not AnsiString(something). The latter cannot contain UTF8 unless you do some really nasty tricks... :-) Michael ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneytwrote: >[...] > You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly: SetMultiByteConversionCodePage(CP_UTF8); SetMultiByteRTLFileSystemCodePage(CP_UTF8); You should add these lines in an early initialization section. The beginning of your program might be too late. > Then no conversions will be done for all ansistrings that contain UTF8. And this really means AnsiString, not AnsiString(something). Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, Apr 4, 2016 at 11:18 AM,wrote: > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. You are completely lost with this issue. The automatic conversion of encodings is a big step forward. Just use the new UTF-8 mode provided by Lazarus and remove all explicit conversion functions. http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Juha ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, Apr 4, 2016 at 11:36 AM,wrote: > Sorry, I was not able to come to that conclusion from the existing docs. Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus It works also without LCL. The bottom line is: remove all explicit conversion functions. Juha ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
tobiasgiesen wrote on Mon, 04 Apr 2016: That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. It is incorrect in the sense that there is nothing utf8-specific about the way your code (ab)used ansistrings. I will fix it, since that page is more or less part of the official FPC documentation (since it's linked from the FPC 3.0 release notes). Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
> That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On 2016-04-04 09:43, tobiasgie...@gmail.com wrote: > Very theoretical. What you really need to tell > people is something like this: That please update the wiki - it is user editable. Even a seasoned developers as myself still needs to get my head around all this FPC Unicode stuff. So any information and tips on the wiki would be greatly appreciated. I haven't moved to FPC 3.0 yet, but when I do, I too will have lots of testing to do in my own code. I don't use LCL, but but do currently store UTF-8 text inside AnsiString's for years (on all platforms). Regards, - Graeme - -- fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal http://fpgui.sourceforge.net/ My public PGP key: http://tinyurl.com/graeme-pgp ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
> > I use TStringList for UTF-8 strings. This is no longer possible, because > > automatic conversions cause question marks and data loss. > > Lazarus uses TStringList with UTF-8 all over the place. > > Please post a complete example demonstrating the problem. Sorry - this was only theoretical, because of the Backward compatibility section on the FPC Unicode Support page. It says that a "defined way" to use strings is "you do not store data in an ansistring that has been encoded using something else than the system's default code page, and subsequently pass this string as-is to an FPC RTL routine". That would mean I cannot use TStringList for UTF-8. The paragraph is misleading, really. Very theoretical. What you really need to tell people is something like this: "Unicode aware Pascal code needs to set DefaultSystemCodePage to CP_UTF8". I am sorry but I was really shocked this morning when I saw the question marks :) Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8. > Then no conversions will be done for all ansistrings that contain UTF8. Fantastic. Many thanks. That fixes my problem entirely (I think). Sorry, I was not able to come to that conclusion from the existing docs. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. Same answer as in my other mail. Set DefaultSystemCodePage to CP_UTF8. I also use a large amount of third-party libraries that use the AnsiString data type for UTF-8. I really want to use FPC 3 due to other things, but this is a deal breaker. Why not add a simple switch or even a run-time Boolean global variable to turn off codepage conversions? It behaves differently from Delphi too. This depends on the version of Delphi :) Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
On Mon, 4 Apr 2016, Tobias Giesen wrote: Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points these conversions are done. The FPC 3 Unicode documentation says something about "passing it to a RTL routine". What about this code: var a,b:Ansistring; begin a:=Utf8Encode(AWideString); b:=Copy(a,1,10); end; Is "Copy" an RTL routine? Is this OK or not? Best for me would be to be able to turn the conversions off completely. You cannot, but you can set DefaultSystemCodePage to CP_UTF8. Then no conversions will be done for all ansistrings that contain UTF8. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
On Mon, 04 Apr 2016 10:18:18 +0200 tobiasgie...@gmail.com wrote: > Hello, > > disallowing "AnsiString" code for UTF-8 is a huge regression. > > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. Lazarus uses TStringList with UTF-8 all over the place. Please post a complete example demonstrating the problem. Mattias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?
Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. I also use a large amount of third-party libraries that use the AnsiString data type for UTF-8. I really want to use FPC 3 due to other things, but this is a deal breaker. Why not add a simple switch or even a run-time Boolean global variable to turn off codepage conversions? It behaves differently from Delphi too. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
[fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion
Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points these conversions are done. The FPC 3 Unicode documentation says something about "passing it to a RTL routine". What about this code: var a,b:Ansistring; begin a:=Utf8Encode(AWideString); b:=Copy(a,1,10); end; Is "Copy" an RTL routine? Is this OK or not? Best for me would be to be able to turn the conversions off completely. Is that possible? Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal