Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 23:42, Marco van de Voort wrote: Op 12/26/2019 om 9:12 PM schreef Ondrej Pokorny: In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. And in FPC it is exactly the same, No, it is not. In FPC: class function TEncoding.GetDefault: TEncoding; begin Result := GetANSI; end; For the meaning of TEncoding.Default and TEncoding.ANSI in Delphi see the docs above. BUT Lazarus overrides default with UTF8 on Windows. Yes, it does it since r61976 - so only recently. And it is a very questionable commit because a) is not Delphi compatible b) breaks OS-ANSI calls c) breaks ANSI FPC code It must either be reverted or we need some high-level method to get the OS-ANSI codepage without this override. As you can see that is NOT compatible with Delphi above. Yes, and I am against r61976 - but because r61976 overrides TEncoding.ANSI to UTF-8 on Windows. IMO TEncoding.Default should be UTF-8 in Lazarus even on Windows (whereas TEncoding.ANSI should stay OS-ANSI) - I try to explain again why this actually fits very well into the Delphi/FPC encoding concept. I will now talk only about Windows for simplicity (because the ANSI concept is most important on Windows): Delphi doesn't know the DefaultSystemEncoding concept that FPC has. The default AnsiString encoding in Delphi is always OS-ANSI (CP_ACP). Therefore it makes perfect sense to have TEncoding.Default to point to the default AnsiString encoding that is OS-ANSI encoding in Delphi. FPC, on the contrary, overrides the CP_ACP value with DefaultSystemEncoding. So the default AnsiString encoding is not OS-ANSI but DefaultSystemEncoding. Therefore, again, it makes perfect sense to have TEncoding.Default to point to the default AnsiString encoding that is DefaultSystemEncoding in FPC. Onrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 27.12.2019 0:19, Michael Van Canneyt wrote: On Thu, 26 Dec 2019, Ondrej Pokorny wrote: On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. As far as I know, TEncoding.ANSI = CP_ACP. This is indeed not correct. See https://wiki.freepascal.org/FPC_Unicode_support : CP_ACP: this value represents the currently set "default system code page". See #Code page settings for more information. The code for it is in sysos.inc: function TranslatePlaceholderCP(cp: TSystemCodePage): TSystemCodePage; {$ifdef SYSTEMINLINE}inline;{$endif} begin TranslatePlaceholderCP:=cp; case cp of CP_OEMCP: TranslatePlaceholderCP:=GetOEMCP; CP_ACP: TranslatePlaceholderCP:=DefaultSystemCodePage; end; end; Whereas TEncoding.ANSI is the WIN-ANSI OS encoding: class function TEncoding.GetANSI: TEncoding; // ... FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) and TStandardCodePageEnum = ( scpAnsi, // system Ansi code page (GetACP on windows) - as you can see the CP_ACP value does not correspond with the GetACP WinAPI call result. (But this is wanted as documented in https://wiki.freepascal.org/FPC_Unicode_support ). Why should this equal TEncoding.Default ? sysencoding.inc: class function TEncoding.GetDefault: TEncoding; begin Result := GetANSI; end; I think TEncoding.Default = CP_UTF8 on linux ? Yes, in FPC this is correct. Also TEncoding.ANSI =CP_UTF8 on linux in FPC. The main problem I see is that there is the system (OS) encoding, and the encoding specified by DefaultSystemCodePage. These do not necessarily agree. So it makes sense to have 2 TEncodings: one for the system encoding, one for the DefaultSystemCodePage variable. They will not be equal. If they were, then the DefaultSystemCodePage variable makes no sense whatever. Yes, indeed. Therefore I suggested * TEncoding.Default for the DefaultSystemCodePage variable and * TEncoding.ANSI for the system encoding. Currently we have * TEncoding.SystemEncoding for the DefaultSystemCodePage variable and * both TEncoding.ANSI and TEncoding.Default for the system encoding. (TEncoding.ANSI and TEncoding.Default are equal in FPC.) Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019, Ondrej Pokorny wrote: On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. As far as I know, TEncoding.ANSI = CP_ACP. Why should this equal TEncoding.Default ? I think TEncoding.Default = CP_UTF8 on linux ? The main problem I see is that there is the system (OS) encoding, and the encoding specified by DefaultSystemCodePage. These do not necessarily agree. So it makes sense to have 2 TEncodings: one for the system encoding, one for the DefaultSystemCodePage variable. They will not be equal. If they were, then the DefaultSystemCodePage variable makes no sense whatever. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
Op 12/26/2019 om 9:12 PM schreef Ondrej Pokorny: In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. And in FPC it is exactly the same, BUT Lazarus overrides default with UTF8 on Windows. As you can see that is NOT compatible with Delphi above. Worse, since the startup encoding is the encoding to communicate with the OS, as soon as Read the .NET docs about Encoding.Default: https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN=netframework-4.8#System_Text_Encoding_Default on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Yes, totally irrelevant. On Windows ansi means something like Windows-1252 and -A apis, and the only unicode api is -W and UTF8. .NET is as relevant as Linux in this matter; other application API. With all the information from the docs, I am more and more convinced that TEncoding.SystemEncoding is superfluous and TEncoding.Default should take over its meaning: TEncoding.Default should reflect changes in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed ANSI code page. With it there is no need for TEncoding.SystemEncoding. The defaultsystemencoding changes the meaning of the codepage for the application libraries (read: the pascal parts), NOT for the delphi api. With this change, in the current Lazarus UTF-8 solution, TEncoding.Default will be UTF-8. In the future Unicode and Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi meaning (ANSI/UTF-8). IMO the concept is very sensible. Delphi is UTF-16. UTF-8 is only used for document formats, not for APIs. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, Dec 26, 2019 at 9:12 PM Ondrej Pokorny wrote: > With all the information from the docs, I am more and more convinced > that TEncoding.SystemEncoding is superfluous and TEncoding.Default > should take over its meaning: TEncoding.Default should reflect changes > in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed > ANSI code page. With it there is no need for TEncoding.SystemEncoding. I agree with Ondrej on this point. > With this change, in the current Lazarus UTF-8 solution, > TEncoding.Default will be UTF-8. In the future Unicode and > Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi > meaning (ANSI/UTF-8). IMO the concept is very sensible. It would make life much easier for the Lazarus developers. Currently we're kind of fighting the compiler, which is not good. -- Bart ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 19:29, Michael Van Canneyt wrote: So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. That would be possible as well. But still please reconsider it: One reason: just from the convention - the default codepage to use should be TEncoding.Default. That is intuitive. Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal properties. And another FPC-only property TEncoding.SystemEncoding. That means 3 properties for 2 values. --- In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See: http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI On Windows, they are equal but on POSIX they are different: TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from CFLocaleGetIdentifier. Read the .NET docs about Encoding.Default: https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN=netframework-4.8#System_Text_Encoding_Default on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Windows. With all the information from the docs, I am more and more convinced that TEncoding.SystemEncoding is superfluous and TEncoding.Default should take over its meaning: TEncoding.Default should reflect changes in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed ANSI code page. With it there is no need for TEncoding.SystemEncoding. With this change, in the current Lazarus UTF-8 solution, TEncoding.Default will be UTF-8. In the future Unicode and Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi meaning (ANSI/UTF-8). IMO the concept is very sensible. --- Btw. you have a bug in: constructor TStringStream.CreateRaw(const AString: RawByteString); var CP: TSystemCodePage; begin CP:=StringCodePage(AString); if (CP=CP_ACP) or (CP=TEncoding.Default.CodePage) then // this line is wrong begin FEncoding:=TEncoding.Default; FOwnsEncoding:=False; end else In the code above, TEncoding.Default is used if CP=CP_ACP. That is currently wrong - the bug perfectly reflects my suggestion for TEncoding.Default change. Currently, CP_ACP corresponds with DefaultSystemEncoding and thus with TEncoding.SystemEncoding and not TEncoding.Default. TEncoding.Default corresponds with ANSI (that is not CP_ACP as documented https://wiki.freepascal.org/FPC_Unicode_support ). The code should be: if (CP=CP_ACP) or (CP=TEncoding.SystemEncoding.CodePage) then begin FEncoding:=TEncoding.SystemEncoding; FOwnsEncoding:=False; end else if (CP=TEncoding.Default.CodePage) then begin FEncoding:=TEncoding.Default; FOwnsEncoding:=False; end else // ... The current CreateRaw code is correct for my suggestion. As you can see you intuitively expected the approach I am suggesting :) Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019, Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. That this was going to create problems and require code changes in user code, was clear from the start. I suggest a compromise (steps): 1.) Keep TEncoding.ANSI always WIN-ANSI and Delphi-compatible. (Don't change it to DefaultSystemCodePage in Lazarus.) 2.) Change TEncoding.Default value to current TEncoding.SystemEncoding. I.e. TEncoding.Default would correspond to DefaultSystemCodePage and CP_ACP. Yes, this will be Delphi-incompatible - but CP_ACP is Delphi-incompatible as well (!) - so the incompatibilities are consequent here. 3.) Delete TEncoding.SystemEncoding because it is an FPC-only construct, it is not needed anymore (because it will become TEncoding.Default) and it has not been released in any stable version. TEncoding.SystemEncoding was introduced to reflect changes in DefaultSystemCodePage whereas TEncoding.Default does not change, it reflects a fixed code page. What I think should be done is make sure TEncoding.Default is initialized in the sysutils unit initialization, so it is the actual system default. So no, I don't think these need to be changed/merged. What IMO can be discussed is which of these 2 need to be used as the default codepage in other code. It should then resolve the problems that appear, I think. Michael. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 17:02, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:55:04 +0100 Ondrej Pokorny wrote: On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. Currently, the no-encoding overloads of TStrings.LoadFrom*() and TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Check the code more carefully. It uses DefaultSystemCodePage only when no widestringmanager is present - which is basically never the case (at least on win32, Linux, Mac OS). It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS language version). Yes, I just saw it. Bummer. The comment // DefaultSystemCodePage can be set to non-ANSI is misleading and doesn't correspond to both the code and the currently desired behavior https://bugs.freepascal.org/view.php?id=32961#c115162 I deleted it. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Merry Christmas!
Merry Christmas everyone. Keep up the amazing work. On Wed 25 Dec 2019, 20:35 J. Gareth Moreton, wrote: > Merry Christmas, FPC developers! > > Gareth aka. Kit > > ___ > fpc-devel maillist - fpc-devel@lists.freepascal.org > https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel > ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019 16:55:04 +0100 Ondrej Pokorny wrote: > On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: > > On Thu, 26 Dec 2019 16:15:03 +0100 > > Ondrej Pokorny wrote: > > > >> Hello, > >> > >> a lot of people have a problem with the TStrings.LoadFrom*() > >> changes when TEncoding support was added. > >> > >> Currently, the no-encoding overloads of TStrings.LoadFrom*() and > >> TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and > >> not DefaultSystemCodePage. > > It seems FPC 3.3.1 does use DefaultSystemCodePage: > > > > class function TEncoding.GetANSI: TEncoding; > > begin > > > > if not Assigned(FStandardEncodings[seAnsi]) then > > begin > >// DefaultSystemCodePage can be set to non-ANSI > >if Assigned(widestringmanager.GetStandardCodePageProc) then > > FStandardEncodings[seAnsi] := > > TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) > > else FStandardEncodings[seAnsi] := > > TMBCSEncoding.Create(DefaultSystemCodePage); ... > > end; > > Check the code more carefully. It uses DefaultSystemCodePage only > when no widestringmanager is present - which is basically never the > case (at least on win32, Linux, Mac OS). > > It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is > WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS > language version). Yes, I just saw it. Bummer. Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On 26.12.2019 16:41, Mattias Gaertner via fpc-devel wrote: On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. Currently, the no-encoding overloads of TStrings.LoadFrom*() and TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Check the code more carefully. It uses DefaultSystemCodePage only when no widestringmanager is present - which is basically never the case (at least on win32, Linux, Mac OS). It uses widestringmanager.GetStandardCodePageProc(scpAnsi) that is WIN-ANSI on win32 (typically 1250, 1251, 1252 - depending on your OS language version). Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
On Thu, 26 Dec 2019 16:15:03 +0100 Ondrej Pokorny wrote: > Hello, > > a lot of people have a problem with the TStrings.LoadFrom*() changes > when TEncoding support was added. > > Currently, the no-encoding overloads of TStrings.LoadFrom*() and > TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and > not DefaultSystemCodePage. It seems FPC 3.3.1 does use DefaultSystemCodePage: class function TEncoding.GetANSI: TEncoding; begin if not Assigned(FStandardEncodings[seAnsi]) then begin // DefaultSystemCodePage can be set to non-ANSI if Assigned(widestringmanager.GetStandardCodePageProc) then FStandardEncodings[seAnsi] := TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi)) else FStandardEncodings[seAnsi] := TMBCSEncoding.Create(DefaultSystemCodePage); ... end; Maybe you are querying TEncoding.Default before changing DefaultSystemCodePage? Mattias ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
Hello, a lot of people have a problem with the TStrings.LoadFrom*() changes when TEncoding support was added. Currently, the no-encoding overloads of TStrings.LoadFrom*() and TStrings.SaveTo*() use the TEncoding.Default, which is WIN-ANSI and not DefaultSystemCodePage. Before the changes, the encoding was just ignored and the file was read without any codepage support (thus in fact in DefaultSystemCodePage). A lot of people have a problem with it because in Lazarus they used to read UTF-8 text with TStrings.LoadFrom*(). Now the text is broken because TStrings.LoadFrom*() load the text in WIN-ANSI. The Lazarus team came with some kind of solution: they changed the FPC internal function widestringmanager.GetStandardCodePageProc to return UTF-8 and thus TEncoding.Default and TEncoding.ANSI to be UTF-8. This, as a result, broke every ANSI code that needs real WIN-ANSI codepage (like the ODBC: https://bugs.freepascal.org/view.php?id=36481 ). (And it introduced more Delphi & other Lazarus incompatibilities just for the sake of keeping TStrings.LoadFrom*() legacy-Lazarus compatible.) For me the Lazarus solution is not acceptable. But other Lazarus team members seem not to be happy with the encoding breaking change in FPC TStrings.LoadFrom*() and SaveTo*() and ask for a solution. --- I suggest a compromise (steps): 1.) Keep TEncoding.ANSI always WIN-ANSI and Delphi-compatible. (Don't change it to DefaultSystemCodePage in Lazarus.) 2.) Change TEncoding.Default value to current TEncoding.SystemEncoding. I.e. TEncoding.Default would correspond to DefaultSystemCodePage and CP_ACP. Yes, this will be Delphi-incompatible - but CP_ACP is Delphi-incompatible as well (!) - so the incompatibilities are consequent here. 3.) Delete TEncoding.SystemEncoding because it is an FPC-only construct, it is not needed anymore (because it will become TEncoding.Default) and it has not been released in any stable version. The above steps perfectly correlate with the CP_ACP value corresponding to DefaultSystemEncoding. As I wrote before, the CP_ACP is not Delphi-compatible either. See https://wiki.freepascal.org/FPC_Unicode_support It makes perfect sense to keep TEncoding consequent with CP_ACP and make TEncoding.Default to correspond with DefaultSystemEncoding. It is also natural: I would expect TEncoding.Default to correspond with the default (Ansi)String encoding, which is DefaultSystemEncoding. Ondrej ___ fpc-devel maillist - fpc-devel@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel