Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread wkitty42
On 04/04/2016 06:23 AM, tobiasgie...@gmail.com wrote: OK, I just confirmed. Adding clocale to my 5-line test program doesn't affect the DefaultSystemCodePage result, but as soon as I add cwstring to the uses clause, then DefaultSystemCodePage returns 65001. On Mac, not even cwstring does that.

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Seth Grover
> Not necessarily: > if you are dealing with UTF8/UnicodeString and other codepages, it is > quite likely, and preferred, that you have unit cwstring included. > Michael. Question about this: If I use cthreads, cwstring, and load my own memory manager, what should the order in the uses clause be?

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Jonas Maebe
tobiasgiesen wrote on Mon, 04 Apr 2016: > Terminal has LC_CTYPE=UTF-8. What about LC_ALL? My Mac OS installations do not have LC_ALL. But I just noticed that Carbon GUI programs do not get LC_CTYPE in their environment either. If none of the environment variables related to code pages

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 13:15, Sven Barth wrote: > Qt uses UTF-16 as well... I always thought that strange. After all, Qt was born as a Unix-type GUI toolkit. Unless I got my facts wrong. Then again, it's only in recent years that Unix-like systems moved to UTF-8. I think even FreeBSD didn't use UTF-8 out

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Sven Barth
Am 04.04.2016 13:21 schrieb "Graeme Geldenhuys" < mailingli...@geldenhuys.co.uk>: > > On 2016-04-04 12:06, Michael Van Canneyt wrote: > > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > > On Windows, UnicodeString is more 'natural' or 'native'. > > Based on

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: [add LCL UTF-8 helper units to FPC] Though it could probably be added as quick as in FPC 3.0.2. It's simply two new units that need to be explicitly used by

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> > Terminal has LC_CTYPE=UTF-8. > > What about LC_ALL? My Mac OS installations do not have LC_ALL. But I just noticed that Carbon GUI programs do not get LC_CTYPE in their environment either. So maybe cwstring needs to be fixed for Carbon GUI Mac OS X programs. What I see in the environment

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 12:06, Michael Van Canneyt wrote: 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 12:06, Michael Van Canneyt wrote: > 1. Using UTF8 is a choice of lazarus. Other people may prefer UnicodeString. > On Windows, UnicodeString is more 'natural' or 'native'. Based on Internet standards and most popular OSes (mobile devices included), UTF-8 is kind - so we all know

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Jonas Maebe
tobiasgiesen wrote on Mon, 04 Apr 2016: How did you get a codepage 20127 Mac? The Mac is UTF-8, This statement makes no sense. There is no "UTF-8" or "non-UTF-8" Mac. The Unix environment on OS X can use any OS-supported code page. but cwstring or whatever does not realize it.

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: more complete solution for UTF-8. This is useful for many users. They don't have to reinvent the wheel. Not having looked at the two units you mentioned... but if this is a general requirement for anybody using UTF-8 or similar with FPC 3.0, then

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:34, Mattias Gaertner wrote: > for that. In fact you don't have to use LazUtils: some users simply > copied the two units FPCAdds and LazUTF8. It's all open source. This was not made clear until you explicitly mentioned it. Juha's initial comment was vague on the matter, and the

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> How did you get a codepage 20127 Mac? The Mac is UTF-8, but cwstring or whatever does not realize it. Since I cannot easily step into it with the debugger, I can't tell you why. Terminal has LC_CTYPE=UTF-8. Well I will just set the default codepages manually. Cheers, Tobias

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 11:34:18 +0100 Graeme Geldenhuys wrote: > On 2016-04-04 11:23, tobiasgie...@gmail.com wrote: > > On Mac, not even cwstring does that. It sets the DefaultSystemCodePage > > to 20127. > > I just installed FPC 3.0 on my Macbook Pro (bought in the

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:40, Mattias Gaertner wrote: > Or simply copy the two units FPCAdds, LazUTF-8 or parts of them from > here: Thank you Juha and Mattias - I'll take a look at those to see what they do. Regards, - Graeme - ___ fpc-pascal maillist -

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 13:27:05 +0300 Juha Manninen wrote: >[...] > But yes, it requires Lazarus IDE because LazUtils is a Lazarus > package. At least you must create and compile the project using > Lazarus IDE. Or simply copy the two units FPCAdds, LazUTF-8 or parts of

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:23, tobiasgie...@gmail.com wrote: > On Mac, not even cwstring does that. It sets the DefaultSystemCodePage > to 20127. I just installed FPC 3.0 on my Macbook Pro (bought in the UK) and did the same test. Here DefaultSystemCodePage returned 65001. So I guess it depends on your OSX

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:52:20 +0100 Graeme Geldenhuys wrote: > On 2016-04-04 10:27, Juha Manninen wrote: > > Just use the new UTF-8 mode provided by Lazarus and remove all > > explicit conversion functions. > > This is the FPC mailing list. Not everybody here uses

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 10:43, Michael Van Canneyt wrote: No. It says 'typically'. That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 12:52 PM, Graeme Geldenhuys wrote: > This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so > making such a suggestion is wishful thinking. For example, your > suggestion means nothing to me, I don't use LCL. Yes, I should

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread tobiasgiesen
> OK, I just confirmed. Adding clocale to my 5-line test program doesn't > affect the DefaultSystemCodePage result, but as soon as I add cwstring > to the uses clause, then DefaultSystemCodePage returns 65001. On Mac, not even cwstring does that. It sets the DefaultSystemCodePage to 20127. So,

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:58, Jonas Maebe wrote: > I have now updated the page to reflect this > fact. Thanks Jonas, that's an important point to note. Regards, - Graeme - ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 11:07, Michael Van Canneyt wrote: > if you are dealing with UTF8/UnicodeString and other codepages, it is > quite likely, and preferred, that you have unit cwstring included. OK, I just confirmed. Adding clocale to my 5-line test program doesn't affect the DefaultSystemCodePage

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:43, Michael Van Canneyt wrote: > > No. It says 'typically'. > > That doesn't necessarily mean it is so, but is mostly correct. It is still a very vague assumption. As I mentioned in my previous reply, that statement is false on my FreeBSD system too. So I should read the wiki

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversionx

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Graeme Geldenhuys wrote: On 2016-04-04 10:39, tobiasgie...@gmail.com wrote: It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by default CP_UTF8." Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Indeed,

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:39, tobiasgie...@gmail.com wrote: > It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the > RTL uses > here by default CP_UTF8." > > Not on my Mac. DefaultSystemCodePage is 20127 before I set it to 65001. Indeed, and on my FreeBSD system

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Jonas Maebe wrote: Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe
Michael Van Canneyt wrote on Mon, 04 Apr 2016: Don't be too hasty in changing things. Jonas (who created that page) is usually very careful in such matters. I did forget to mention that the Default*CodePage variables are only initialised with the "real" values on *nix platforms if you

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 10:27, Juha Manninen wrote: > Just use the new UTF-8 mode provided by Lazarus and remove all > explicit conversion functions. This is the FPC mailing list. Not everybody here uses Lazarus or LCL, so making such a suggestion is wishful thinking. For example, your suggestion means

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Jonas Maebe
tobiasgiesen wrote on Mon, 04 Apr 2016: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 11:35:53 +0200 (CEST) Michael Van Canneyt wrote: >[...] > >> Then no conversions will be done for all ansistrings that contain UTF8. > > > > And this really means AnsiString, not AnsiString(something). > > The latter cannot contain UTF8 unless you do

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> > > On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: > > >> Your question was not about Lazarus but maybe you should read this: > >> http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus > > > > Very interesting, but apparently there is some wrong info. > > > > It says "On Linux and

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> Your question was not about Lazarus but maybe you should read this: > http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus Very interesting, but apparently there is some wrong info. It says "On Linux and Mac OS X UTF-8 is typically the system codepage, so the RTL uses here by

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Mattias Gaertner wrote: On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneyt wrote: [...] You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly:

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Mattias Gaertner
On Mon, 4 Apr 2016 10:32:58 +0200 (CEST) Michael Van Canneyt wrote: >[...] > You cannot, but you can set DefaultSystemCodePage to CP_UTF8. I think it is important to note how to do this properly: SetMultiByteConversionCodePage(CP_UTF8);

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:18 AM, wrote: > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. You are completely lost with this issue. The automatic conversion of encodings is a big step

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Juha Manninen
On Mon, Apr 4, 2016 at 11:36 AM, wrote: > Sorry, I was not able to come to that conclusion from the existing docs. Your question was not about Lazarus but maybe you should read this: http://wiki.freepascal.org/Better_Unicode_Support_in_Lazarus It works also without

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Jonas Maebe
tobiasgiesen wrote on Mon, 04 Apr 2016: That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. It is incorrect in the sense that there is nothing utf8-specific about the way your code

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> That please update the wiki - it is user editable. Done: http://wiki.freepascal.org/FPC_Unicode_support#Backward_compatibility I hope this is correct. Cheers, Tobias ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Graeme Geldenhuys
On 2016-04-04 09:43, tobiasgie...@gmail.com wrote: > Very theoretical. What you really need to tell > people is something like this: That please update the wiki - it is user editable. Even a seasoned developers as myself still needs to get my head around all this FPC Unicode stuff. So any

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
> > I use TStringList for UTF-8 strings. This is no longer possible, because > > automatic conversions cause question marks and data loss. > > Lazarus uses TStringList with UTF-8 all over the place. > > Please post a complete example demonstrating the problem. Sorry - this was only theoretical,

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread tobiasgiesen
> You cannot, but you can set DefaultSystemCodePage to CP_UTF8. > Then no conversions will be done for all ansistrings that contain UTF8. Fantastic. Many thanks. That fixes my problem entirely (I think). Sorry, I was not able to come to that conclusion from the existing docs. Cheers, Tobias

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, tobiasgie...@gmail.com wrote: Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. Same answer as in my other mail. Set

Re: [fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Michael Van Canneyt
On Mon, 4 Apr 2016, Tobias Giesen wrote: Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points

Re: [fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread Mattias Gaertner
On Mon, 04 Apr 2016 10:18:18 +0200 tobiasgie...@gmail.com wrote: > Hello, > > disallowing "AnsiString" code for UTF-8 is a huge regression. > > I use TStringList for UTF-8 strings. This is no longer possible, because > automatic conversions cause question marks and data loss. Lazarus uses

[fpc-pascal] FPC 3 regression: cannot use TStringList for UTF-8 data any more?

2016-04-04 Thread tobiasgiesen
Hello, disallowing "AnsiString" code for UTF-8 is a huge regression. I use TStringList for UTF-8 strings. This is no longer possible, because automatic conversions cause question marks and data loss. I also use a large amount of third-party libraries that use the AnsiString data type for UTF-8.

[fpc-pascal] FPC 3: disabling automatic AnsiString codepage conversion

2016-04-04 Thread Tobias Giesen
Hello, my application uses the AnsiString type to store UTF-8 data. That was totally fine. Now in FPC 3, automatic conversions cause data loss. I get question marks replacing Chinese characters, for example. I do not fully understand at which points these conversions are done. The FPC 3 Unicode