Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 03.01.2020 11:09, Michael Van Canneyt wrote: I also think it is very hypothetical, and not a problem unless you want to use the same stream in Delphi and FPC. Well, you have my blessing for the soPreserveBOM :) Added in r43848. I'll check how the FPC documentation works and try to add it there. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Fri, 3 Jan 2020, Ondrej Pokorny wrote: On 03.01.2020 10:14, Michael Van Canneyt wrote: On Fri, 3 Jan 2020, Ondrej Pokorny wrote: On 03.01.2020 00:35, Werner Pamler wrote: Am 02.01.2020 um 20:10 schrieb Ondrej Pokorny: TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. There is also the problem that currently it is not possible, without further action, to retain the BOM state of a file loaded into a stringlist, modified and written back because the presence of a BOM is forgotten after reading - see the other discussion (https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042363.html). Wouldn't it make sense to introduce a )Read)BOM property (a boolean parameter or an element of the new Options) which gets its value when the file is loaded or the strings are assigned? Then the user can set the StringList.WriteBOM equal to the StringList.BOM when he wants to keep the BOM for writing back. Yes, that is perfectly reasonable. I'd prefer a new element in Options but there is the risk that Delphi adds a new option in the future and then we'll have a problem. So maybe a "PreserveBOM" or "SetWriteBOMOnLoad" property will be better. When set to true, WriteBOM will be set in LoadFrom*() according to BOM presence of the loaded file. I don't see why a new option is a problem ? They are not streamed anyway. So I would do the opposite, add an option. soPreserveBOM. If you are fine with it, me as well. Yes, the problem is if somebody streams the property or uses Ord(soPreserveBOM) for something etc. I admit that it is a very hypothetical issue. I also think it is very hypothetical, and not a problem unless you want to use the same stream in Delphi and FPC. Well, you have my blessing for the soPreserveBOM :) Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 03.01.2020 10:14, Michael Van Canneyt wrote: On Fri, 3 Jan 2020, Ondrej Pokorny wrote: On 03.01.2020 00:35, Werner Pamler wrote: Am 02.01.2020 um 20:10 schrieb Ondrej Pokorny: TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. There is also the problem that currently it is not possible, without further action, to retain the BOM state of a file loaded into a stringlist, modified and written back because the presence of a BOM is forgotten after reading - see the other discussion (https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042363.html). Wouldn't it make sense to introduce a )Read)BOM property (a boolean parameter or an element of the new Options) which gets its value when the file is loaded or the strings are assigned? Then the user can set the StringList.WriteBOM equal to the StringList.BOM when he wants to keep the BOM for writing back. Yes, that is perfectly reasonable. I'd prefer a new element in Options but there is the risk that Delphi adds a new option in the future and then we'll have a problem. So maybe a "PreserveBOM" or "SetWriteBOMOnLoad" property will be better. When set to true, WriteBOM will be set in LoadFrom*() according to BOM presence of the loaded file. I don't see why a new option is a problem ? They are not streamed anyway. So I would do the opposite, add an option. soPreserveBOM. If you are fine with it, me as well. Yes, the problem is if somebody streams the property or uses Ord(soPreserveBOM) for something etc. I admit that it is a very hypothetical issue. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Fri, 3 Jan 2020, Ondrej Pokorny wrote: On 03.01.2020 00:35, Werner Pamler wrote: Am 02.01.2020 um 20:10 schrieb Ondrej Pokorny: TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. There is also the problem that currently it is not possible, without further action, to retain the BOM state of a file loaded into a stringlist, modified and written back because the presence of a BOM is forgotten after reading - see the other discussion (https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042363.html). Wouldn't it make sense to introduce a )Read)BOM property (a boolean parameter or an element of the new Options) which gets its value when the file is loaded or the strings are assigned? Then the user can set the StringList.WriteBOM equal to the StringList.BOM when he wants to keep the BOM for writing back. Yes, that is perfectly reasonable. I'd prefer a new element in Options but there is the risk that Delphi adds a new option in the future and then we'll have a problem. So maybe a "PreserveBOM" or "SetWriteBOMOnLoad" property will be better. When set to true, WriteBOM will be set in LoadFrom*() according to BOM presence of the loaded file. I don't see why a new option is a problem ? They are not streamed anyway. So I would do the opposite, add an option. soPreserveBOM. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 03.01.2020 00:35, Werner Pamler wrote: Am 02.01.2020 um 20:10 schrieb Ondrej Pokorny: TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. There is also the problem that currently it is not possible, without further action, to retain the BOM state of a file loaded into a stringlist, modified and written back because the presence of a BOM is forgotten after reading - see the other discussion (https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042363.html). Wouldn't it make sense to introduce a )Read)BOM property (a boolean parameter or an element of the new Options) which gets its value when the file is loaded or the strings are assigned? Then the user can set the StringList.WriteBOM equal to the StringList.BOM when he wants to keep the BOM for writing back. Yes, that is perfectly reasonable. I'd prefer a new element in Options but there is the risk that Delphi adds a new option in the future and then we'll have a problem. So maybe a "PreserveBOM" or "SetWriteBOMOnLoad" property will be better. When set to true, WriteBOM will be set in LoadFrom*() according to BOM presence of the loaded file. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
Am 02.01.2020 um 20:10 schrieb Ondrej Pokorny: TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. There is also the problem that currently it is not possible, without further action, to retain the BOM state of a file loaded into a stringlist, modified and written back because the presence of a BOM is forgotten after reading - see the other discussion (https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042363.html). Wouldn't it make sense to introduce a )Read)BOM property (a boolean parameter or an element of the new Options) which gets its value when the file is loaded or the strings are assigned? Then the user can set the StringList.WriteBOM equal to the StringList.BOM when he wants to keep the BOM for writing back. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 27.12.2019 12:01, Ondrej Pokorny wrote: On 27.12.2019 10:40, Michael Van Canneyt wrote: Yes, indeed. Therefore I suggested * TEncoding.Default for the DefaultSystemCodePage variable and * TEncoding.ANSI for the system encoding. Currently we have * TEncoding.SystemEncoding for the DefaultSystemCodePage variable and * both TEncoding.ANSI and TEncoding.Default for the system encoding. (TEncoding.ANSI and TEncoding.Default are equal in FPC.) In that case, why not simply change: class function TEncoding.GetDefault: TEncoding; begin Result := GetSystemEncoding; end; Nothing need be removed. I consider SystemEncoding a better name than Default, and the latter should only be kept for Delphi compatibility. IMHO it would be better to avoid Default, in fact I would change references to Default to SystemEncoding for clarity. Default is completely non-descriptive. If I understand your reasoning correct, that should solve the problems you report ? Yes, that perfectly solves the issues Lazarus developers and users face. I am OK with this solution as well. Thanks! I applied the change class function TEncoding.GetDefault: TEncoding; begin Result := GetSystemEncoding; end; in r43842 before it gets forgotten. I removed the ANSI-hack from Lazarus as well - in r62474. Please note that in Lazarus (where the system encoding is UTF-8), TStrings.SaveTo*() writes BOM by default. Formerly the BOM was not written. Bart reported the issue here: https://lists.freepascal.org/pipermail/fpc-devel/2020-January/042372.html Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Fri, 27 Dec 2019 12:01:24 +0100 Ondrej Pokorny wrote: >[...] > > If I understand your reasoning correct, that should solve the > > problems you > > report ? > > Yes, that perfectly solves the issues Lazarus developers and users > face. I am OK with this solution as well. Thanks! Thank you both \O/ Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 27.12.2019 10:40, Michael Van Canneyt wrote: Yes, indeed. Therefore I suggested * TEncoding.Default for the DefaultSystemCodePage variable and * TEncoding.ANSI for the system encoding. Currently we have * TEncoding.SystemEncoding for the DefaultSystemCodePage variable and * both TEncoding.ANSI and TEncoding.Default for the system encoding. (TEncoding.ANSI and TEncoding.Default are equal in FPC.) In that case, why not simply change: class function TEncoding.GetDefault: TEncoding; begin Result := GetSystemEncoding; end; Nothing need be removed. I consider SystemEncoding a better name than Default, and the latter should only be kept for Delphi compatibility. IMHO it would be better to avoid Default, in fact I would change references to Default to SystemEncoding for clarity. Default is completely non-descriptive. If I understand your reasoning correct, that should solve the problems you report ? Yes, that perfectly solves the issues Lazarus developers and users face. I am OK with this solution as well. Thanks! Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Fri, 27 Dec 2019, Ondrej Pokorny wrote: On 27.12.2019 0:19, Michael Van Canneyt wrote: On Thu, 26 Dec 2019, Ondrej Pokorny wrote: On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. As far as I know, TEncoding.ANSI = CP_ACP. This is indeed not correct. See https://wiki.freepascal.org/FPC_Unicode_support : CP_ACP: this value represents the currently set "default system code page". See #Code page settings for more information. I meant the windows meaning of CP_ACP, not what the RTL makes of it. I think the use of CP_ACP in the RTL is quite dubious. Using CP_SYSTEM or so would have been better. No doubt again a Delphi compatibility naming :( TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) This corresponds to what I meant. and TStandardCodePageEnum = ( scpAnsi, // system Ansi code page (GetACP on windows) - as you can see the CP_ACP value does not correspond with the GetACP WinAPI call result. (But this is wanted as documented in https://wiki.freepascal.org/FPC_Unicode_support ). Why should this equal TEncoding.Default ? sysencoding.inc: class function TEncoding.GetDefault: TEncoding; begin Result := GetANSI; end; I know it is currently so, the question is : why ? :) Maybe Default is better SystemEncoding, see below. I think TEncoding.Default = CP_UTF8 on linux ? Yes, in FPC this is correct. Also TEncoding.ANSI =CP_UTF8 on linux in FPC. Not necessarily, if I read the wiki page correctly. The main problem I see is that there is the system (OS) encoding, and the encoding specified by DefaultSystemCodePage. These do not necessarily agree. So it makes sense to have 2 TEncodings: one for the system encoding, one for the DefaultSystemCodePage variable. They will not be equal. If they were, then the DefaultSystemCodePage variable makes no sense whatever. Yes, indeed. Therefore I suggested * TEncoding.Default for the DefaultSystemCodePage variable and * TEncoding.ANSI for the system encoding. Currently we have * TEncoding.SystemEncoding for the DefaultSystemCodePage variable and * both TEncoding.ANSI and TEncoding.Default for the system encoding. (TEncoding.ANSI and TEncoding.Default are equal in FPC.) In that case, why not simply change: class function TEncoding.GetDefault: TEncoding; begin Result := GetSystemEncoding; end; Nothing need be removed. I consider SystemEncoding a better name than Default, and the latter should only be kept for Delphi compatibility. IMHO it would be better to avoid Default, in fact I would change references to Default to SystemEncoding for clarity. Default is completely non-descriptive. If I understand your reasoning correct, that should solve the problems you report ? Michael.___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 23:42, Marco van de Voort wrote: Op 12/26/2019 om 9:12 PM schreef Ondrej Pokorny: In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. And in FPC it is exactly the same, No, it is not. In FPC: class function TEncoding.GetDefault: TEncoding; begin Result := GetANSI; end; For the meaning of TEncoding.Default and TEncoding.ANSI in Delphi see the docs above. BUT Lazarus overrides default with UTF8 on Windows. Yes, it does it since r61976 - so only recently. And it is a very questionable commit because a) is not Delphi compatible b) breaks OS-ANSI calls c) breaks ANSI FPC code It must either be reverted or we need some high-level method to get the OS-ANSI codepage without this override. As you can see that is NOT compatible with Delphi above. Yes, and I am against r61976 - but because r61976 overrides TEncoding.ANSI to UTF-8 on Windows. IMO TEncoding.Default should be UTF-8 in Lazarus even on Windows (whereas TEncoding.ANSI should stay OS-ANSI) - I try to explain again why this actually fits very well into the Delphi/FPC encoding concept. I will now talk only about Windows for simplicity (because the ANSI concept is most important on Windows): Delphi doesn't know the DefaultSystemEncoding concept that FPC has. The default AnsiString encoding in Delphi is always OS-ANSI (CP_ACP). Therefore it makes perfect sense to have TEncoding.Default to point to the default AnsiString encoding that is OS-ANSI encoding in Delphi. FPC, on the contrary, overrides the CP_ACP value with DefaultSystemEncoding. So the default AnsiString encoding is not OS-ANSI but DefaultSystemEncoding. Therefore, again, it makes perfect sense to have TEncoding.Default to point to the default AnsiString encoding that is DefaultSystemEncoding in FPC. Onrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 27.12.2019 0:19, Michael Van Canneyt wrote: On Thu, 26 Dec 2019, Ondrej Pokorny wrote: On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. As far as I know, TEncoding.ANSI = CP_ACP. This is indeed not correct. See https://wiki.freepascal.org/FPC_Unicode_support : CP_ACP: this value represents the currently set "default system code page". See #Code page settings for more information. The code for it is in sysos.inc: function TranslatePlaceholderCP(cp: TSystemCodePage): TSystemCodePage; {$ifdef SYSTEMINLINE}inline;{$endif} begin TranslatePlaceholderCP:=cp; case cp of CP_OEMCP: TranslatePlaceholderCP:=GetOEMCP; CP_ACP: TranslatePlaceholderCP:=DefaultSystemCodePage; end; end; Whereas TEncoding.ANSI is the WIN-ANSI OS encoding: class function TEncoding.GetANSI: TEncoding; // ... FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) and TStandardCodePageEnum = ( scpAnsi, // system Ansi code page (GetACP on windows) - as you can see the CP_ACP value does not correspond with the GetACP WinAPI call result. (But this is wanted as documented in https://wiki.freepascal.org/FPC_Unicode_support ). Why should this equal TEncoding.Default ? sysencoding.inc: class function TEncoding.GetDefault: TEncoding; begin Result := GetANSI; end; I think TEncoding.Default = CP_UTF8 on linux ? Yes, in FPC this is correct. Also TEncoding.ANSI =CP_UTF8 on linux in FPC. The main problem I see is that there is the system (OS) encoding, and the encoding specified by DefaultSystemCodePage. These do not necessarily agree. So it makes sense to have 2 TEncodings: one for the system encoding, one for the DefaultSystemCodePage variable. They will not be equal. If they were, then the DefaultSystemCodePage variable makes no sense whatever. Yes, indeed. Therefore I suggested * TEncoding.Default for the DefaultSystemCodePage variable and * TEncoding.ANSI for the system encoding. Currently we have * TEncoding.SystemEncoding for the DefaultSystemCodePage variable and * both TEncoding.ANSI and TEncoding.Default for the system encoding. (TEncoding.ANSI and TEncoding.Default are equal in FPC.) Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019, Ondrej Pokorny wrote: On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. As far as I know, TEncoding.ANSI = CP_ACP. Why should this equal TEncoding.Default ? I think TEncoding.Default = CP_UTF8 on linux ? The main problem I see is that there is the system (OS) encoding, and the encoding specified by DefaultSystemCodePage. These do not necessarily agree. So it makes sense to have 2 TEncodings: one for the system encoding, one for the DefaultSystemCodePage variable. They will not be equal. If they were, then the DefaultSystemCodePage variable makes no sense whatever. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
Op 12/26/2019 om 9:12 PM schreef Ondrej Pokorny: In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. And in FPC it is exactly the same, BUT Lazarus overrides default with UTF8 on Windows. As you can see that is NOT compatible with Delphi above. Worse, since the startup encoding is the encoding to communicate with the OS, as soon as Read the .NET docs about Encoding.Default: https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN=netframework-4.8#System_Text_Encoding_Default on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Yes, totally irrelevant. On Windows ansi means something like Windows-1252 and -A apis, and the only unicode api is -W and UTF8. .NET is as relevant as Linux in this matter; other application API. With all the information from the docs, I am more and more convinced that TEncoding.SystemEncoding is superfluous and TEncoding.Default should take over its meaning: TEncoding.Default should reflect changes in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed ANSI code page. With it there is no need for TEncoding.SystemEncoding. The defaultsystemencoding changes the meaning of the codepage for the application libraries (read: the pascal parts), NOT for the delphi api. With this change, in the current Lazarus UTF-8 solution, TEncoding.Default will be UTF-8. In the future Unicode and Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi meaning (ANSI/UTF-8). IMO the concept is very sensible. Delphi is UTF-16. UTF-8 is only used for document formats, not for APIs. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, Dec 26, 2019 at 9:12 PM Ondrej Pokorny wrote: > With all the information from the docs, I am more and more convinced > that TEncoding.SystemEncoding is superfluous and TEncoding.Default > should take over its meaning: TEncoding.Default should reflect changes > in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed > ANSI code page. With it there is no need for TEncoding.SystemEncoding. I agree with Ondrej on this point. > With this change, in the current Lazarus UTF-8 solution, > TEncoding.Default will be UTF-8. In the future Unicode and > Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi > meaning (ANSI/UTF-8). IMO the concept is very sensible. It would make life much easier for the Lazarus developers. Currently we're kind of fighting the compiler, which is not good. -- Bart ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. --- In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. Read the .NET docs about Encoding.Default: https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN=netframework-4.8#System_Text_Encoding_Default on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Windows. With all the information from the docs, I am more and more convinced that TEncoding.SystemEncoding is superfluous and TEncoding.Default should take over its meaning: TEncoding.Default should reflect changes in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed ANSI code page. With it there is no need for TEncoding.SystemEncoding. With this change, in the current Lazarus UTF-8 solution, TEncoding.Default will be UTF-8. In the future Unicode and Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi meaning (ANSI/UTF-8). IMO the concept is very sensible. --- Btw. you have a bug in: constructor TStringStream.CreateRaw(const AString: RawByteString); var CP: TSystemCodePage; begin CP:=StringCodePage(AString); if (CP=CP_ACP) or (CP=TEncoding.Default.CodePage) then // this line is wrong begin FEncoding:=TEncoding.Default; FOwnsEncoding:=False; end else In the code above, TEncoding.Default is used if CP=CP_ACP. That is currently wrong - the bug perfectly reflects my suggestion for TEncoding.Default change. Currently, CP_ACP corresponds with DefaultSystemEncoding and thus with TEncoding.SystemEncoding and not TEncoding.Default. TEncoding.Default corresponds with ANSI (that is not CP_ACP as documented https://wiki.freepascal.org/FPC_Unicode_support ). The code should be: if (CP=CP_ACP) or (CP=TEncoding.SystemEncoding.CodePage) then begin FEncoding:=TEncoding.SystemEncoding; FOwnsEncoding:=False; end else if (CP=TEncoding.Default.CodePage) then begin FEncoding:=TEncoding.Default; FOwnsEncoding:=False; end else // ... The current CreateRaw code is correct for my suggestion. As you can see you intuitively expected the approach I am suggesting :) Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019, Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. That this was going to create problems and require code changes in user code, was clear from the start. I suggest a compromise (steps): 1.) Keep TEncoding.ANSI always WIN-ANSI and Delphi-compatible. (Don't change it to DefaultSystemCodePage in Lazarus.) 2.) Change TEncoding.Default value to current TEncoding.SystemEncoding. I.e. TEncoding.Default would correspond to DefaultSystemCodePage and CP_ACP. Yes, this will be Delphi-incompatible - but CP_ACP is Delphi-incompatible as well (!) - so the incompatibilities are consequent here. 3.) Delete TEncoding.SystemEncoding because it is an FPC-only construct, it is not needed anymore (because it will become TEncoding.Default) and it has not been released in any stable version. TEncoding.SystemEncoding was introduced to reflect changes in DefaultSystemCodePage whereas TEncoding.Default does not change, it reflects a fixed code page. What I think should be done is make sure TEncoding.Default is initialized in the sysutils unit initialization, so it is the actual system default. So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 17:02, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:55:04 +0100 Ondrej Pokorny wrote: On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. Currently, the no-encoding overloads of TStrings.LoadFrom*() and TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Check the code more carefully. It uses DefaultSystemCodePage only when no widestringmanager is present - which is basically never the case (at least on win32, Linux, Mac OS). It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS language version). Yes, I just saw it. Bummer. The comment // DefaultSystemCodePage can be set to non-ANSI is misleading and doesn't correspond to both the code and the currently desired behavior https://bugs.freepascal.org/view.php?id=32961#c115162 I deleted it. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019 16:55:04 +0100 Ondrej Pokorny wrote: > On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: > > On Thu, 26 Dec 2019 16:15:03 +0100 > > Ondrej Pokorny wrote: > > > >> Hello, > >> > >> a lot of people have a problem with the TStrings.LoadFrom*() > >> changes when TEncoding support was added. > >> > >> Currently, the no-encoding overloads of TStrings.LoadFrom*() and > >> TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and > >> not DefaultSystemCodePage. > > It seems FPC 3.3.1 does use DefaultSystemCodePage: > > > > class function TEncoding.GetANSI: TEncoding; > > begin > > > > if not Assigned(FStandardEncodings[seAnsi]) then > > begin > >// DefaultSystemCodePage can be set to non-ANSI > >if Assigned(widestringmanager.GetStandardCodePageProc) then > > FStandardEncodings[seAnsi] := > > TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) > > else FStandardEncodings[seAnsi] := > > TMBCSEncoding.Create(DefaultSystemCodePage); ... > > end; > > Check the code more carefully. It uses DefaultSystemCodePage only > when no widestringmanager is present - which is basically never the > case (at least on win32, Linux, Mac OS). > > It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is > WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS > language version). Yes, I just saw it. Bummer. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. Currently, the no-encoding overloads of TStrings.LoadFrom*() and TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Check the code more carefully. It uses DefaultSystemCodePage only when no widestringmanager is present - which is basically never the case (at least on win32, Linux, Mac OS). It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS language version). Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: > Hello, > > a lot of people have a problem with the TStrings.LoadFrom*() changes > when TEncoding support was added. > > Currently, the no-encoding overloads of TStrings.LoadFrom*() and > TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and > not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Maybe you are querying TEncoding.Default before changing DefaultSystemCodePage? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel