Re: [Lazarus] Garbage writing to console
On Sat, Feb 24, 2018 at 5:17 AM, Juha Manninen via Lazaruswrote: > On Fri, Feb 23, 2018 at 11:15 AM, R0b0t1 via Lazarus > wrote: >> Not to take the thread offtopic, ... > > You took it offtopic anyway. > UTF-8 works just fine. Please read this page : > http://wiki.freepascal.org/Unicode_Support_in_Lazarus > and start a new thread if you did not understand something. > It will not happen again, sir. I am a stain upon this mailing list. My question was not about UTF-8 support in Lazarus. I know it works. My question was about interoperability with Windows. Theoretically there are some serious issues; I figured I would get a better response asking someone using UTF-8 with Windows compared to a general poll. Remorsefully, R0b0t1 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, Feb 23, 2018 at 11:15 AM, R0b0t1 via Lazaruswrote: > Not to take the thread offtopic, ... You took it offtopic anyway. UTF-8 works just fine. Please read this page : http://wiki.freepascal.org/Unicode_Support_in_Lazarus and start a new thread if you did not understand something. Juha -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, 23 Feb 2018 11:18:55 +0100 Mattias Gaertner via Lazaruswrote: >[...] > Write and Writeln are pure compiler magic. > Apparently if the string is CP_ACP, it is passed unchanged to Output, > ignoring the DefaultSystemCodepage variable. I correct myself. ;) My example of write(s) showed that CP_ACP string is converted to console codepage. Passing a string literal to writeln works differently. It is passed unchanged to Output, ignoring the DefaultSystemCodepage variable. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On 23.02.2018 11:18, Mattias Gaertner via Lazarus wrote: Write and Writeln are pure compiler magic. Apparently if the string is CP_ACP, it is passed unchanged to Output, ignoring the DefaultSystemCodepage variable. Otherwise it converts the string to the console codepage. This allows to write arbitrary bytes. Maybe some compiler guru can confirm this. Compiler gurus are reading fpc-devel :) On 22.02.2018 22:26, Ondrej Pokorny via Lazarus wrote: > It looks like writeln ignores the codepage for constant strings but checks it for "normal" strings. You should ask on fpc-devel list for details. Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, 23 Feb 2018 10:56:11 +0100 Luca Olivetti via Lazaruswrote: > El 23/02/18 a les 09:23, Mattias Gaertner via Lazarus ha escrit: > > On Fri, 23 Feb 2018 08:56:55 +0100 > > Ondrej Pokorny via Lazarus wrote: > > > >> On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote: > >>> But this is a Lazarus issue. > >> > >> Why? > > > > Because the wiki page is unclear. > > > > writeln('áéí'); // works only with source codepage UTF-8 > > That's not enough, it has to be either utf8 with BOM or with {$codepage > utf8} > (Which has its own set of problems) "source codepage UTF-8" = "utf8 with BOM" = "{$codepage UTF8}" = "-Fcutf8" > > s:='áéí'; > > writeln(s); // works with and without > > Correct, but I don't understand why > > > > > writeln(UTF8ToConsole('áéí')); // works with and without > > Yes, that works too. Write and Writeln are pure compiler magic. Apparently if the string is CP_ACP, it is passed unchanged to Output, ignoring the DefaultSystemCodepage variable. Otherwise it converts the string to the console codepage. This allows to write arbitrary bytes. Maybe some compiler guru can confirm this. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, Feb 23, 2018 at 3:25 AM, Michael Van Canneyt via Lazaruswrote: > > > On Fri, 23 Feb 2018, R0b0t1 via Lazarus wrote: > >> On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazarus >> wrote: >>> >>> OK, you mean it's a wiki page issue. I just didn't understand how we >>> could >>> solve it in Lazarus :) >>> >> >> I am interested in this thread because I was under the impression that >> UTF-8 support in Windows is fundamentally broken and should not be >> used (it interferes with the C libraries). >> >> Not to take the thread offtopic, but can anyone comment on this in >> practice? > > > Where did you get that from ? > > You can perfectly use UTF8 in FPC code, but when calling a windows API, you > should a) convert UTF8 to UTF16 (or WideString). If you use the correct > types, >the compiler will do it for you most of the time. > b) Use the *W variant of a Windows system call. > The combination of those is the largest part of why http://utf8everywhere.org/ and many independent developers recommend avoiding Window's implementation of UTF-8. It doesn't end up doing you any good, because you typically can not set the whole system to use UTF-8 (because it is broken). The brokenness (described at https://social.msdn.microsoft.com/Forums/vstudio/en-US/e4b91f49-6f60-4ffe-887a-e18e39250905/possible-bugs-in-writefile-and-crt-unicode-issues?forum=vcgeneral) is due to the UTF-8 codepage causing Windows to report multibyte characters as a single character, and stdio assuming one byte per character. The Chinese/Japanese mappings apparently had these problems as well, but a workaround was added. Cheers, R0b0t1 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
El 23/02/18 a les 09:23, Mattias Gaertner via Lazarus ha escrit: On Fri, 23 Feb 2018 08:56:55 +0100 Ondrej Pokorny via Lazaruswrote: On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote: But this is a Lazarus issue. Why? Because the wiki page is unclear. writeln('áéí'); // works only with source codepage UTF-8 That's not enough, it has to be either utf8 with BOM or with {$codepage utf8} (Which has its own set of problems) s:='áéí'; writeln(s); // works with and without Correct, but I don't understand why writeln(UTF8ToConsole('áéí')); // works with and without Yes, that works too. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, 23 Feb 2018, R0b0t1 via Lazarus wrote: On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazaruswrote: OK, you mean it's a wiki page issue. I just didn't understand how we could solve it in Lazarus :) I am interested in this thread because I was under the impression that UTF-8 support in Windows is fundamentally broken and should not be used (it interferes with the C libraries). Not to take the thread offtopic, but can anyone comment on this in practice? Where did you get that from ? You can perfectly use UTF8 in FPC code, but when calling a windows API, you should a) convert UTF8 to UTF16 (or WideString). If you use the correct types, the compiler will do it for you most of the time. b) Use the *W variant of a Windows system call. Michael. -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazaruswrote: > OK, you mean it's a wiki page issue. I just didn't understand how we could > solve it in Lazarus :) > I am interested in this thread because I was under the impression that UTF-8 support in Windows is fundamentally broken and should not be used (it interferes with the C libraries). Not to take the thread offtopic, but can anyone comment on this in practice? Cheers, R0b0t1 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On 23.02.2018 9:23, Mattias Gaertner via Lazarus wrote: On Fri, 23 Feb 2018 08:56:55 +0100 Ondrej Pokorny via Lazaruswrote: On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote: But this is a Lazarus issue. Why? Because the wiki page is unclear. writeln('áéí'); // works only with source codepage UTF-8 s:='áéí'; writeln(s); // works with and without writeln(UTF8ToConsole('áéí')); // works with and without OK, you mean it's a wiki page issue. I just didn't understand how we could solve it in Lazarus :) Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, 23 Feb 2018 08:56:55 +0100 Ondrej Pokorny via Lazaruswrote: > On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote: > > But this is a Lazarus issue. > > Why? Because the wiki page is unclear. writeln('áéí'); // works only with source codepage UTF-8 s:='áéí'; writeln(s); // works with and without writeln(UTF8ToConsole('áéí')); // works with and without Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote: But this is a Lazarus issue. Why? Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On Fri, 23 Feb 2018 08:44:23 +0100 Ondrej Pokorny via Lazaruswrote: > OK, sorry. I wrote bullshit. But still, you should ask on fpc-devel list > regarding compiler issues - the right people are there. fpc-devel is about development of the compiler. fpc-pascal is for normal user questions. But this is a Lazarus issue. Mattias -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
OK, sorry. I wrote bullshit. But still, you should ask on fpc-devel list regarding compiler issues - the right people are there. Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
El 22/02/18 a les 22:26, Ondrej Pokorny via Lazarus ha escrit: On 22.02.2018 12:21, Luca Olivetti via Lazarus wrote: Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui application unchecked") or console application (using LazUTF8). File encoding utf-8 without bom. writeln('áéí') produces garbage, contrary to what's said in http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console No, it's not contrary - just the opposite. The problem is that áéí (ANSI 1250 or whatever) have different codes to your console codepage (CP437 or whatever). You should open/edit your program source code in your console codepage - you obviously edit it in ANSI. Uh? It's utf-8 "When you convert the code to UTF-8, for example by using Lazarus' source editor popup menu item File Settings / Encoding / UTF-8 and clicking on the dialog button "Change file on disk", the ╩ becomes 3 bytes (#226#149#169), so the literal becomes a string. *The procedures write and writeln convert the UTF-8 string to the current console codepage*. So your console program now outputs the '╩' on Windows with any codepage (i.e. not only CP437) and it even works on Linux and Mac OS X." Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
On 22.02.2018 12:21, Luca Olivetti via Lazarus wrote: Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui application unchecked") or console application (using LazUTF8). File encoding utf-8 without bom. writeln('áéí') produces garbage, contrary to what's said in http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console No, it's not contrary - just the opposite. The problem is that áéí (ANSI 1250 or whatever) have different codes to your console codepage (CP437 or whatever). You should open/edit your program source code in your console codepage - you obviously edit it in ANSI. The strange thing is that: procedure w(const s:string); begin writeln(s); end; w('áéí') also gives the correct output. It looks like writeln ignores the codepage for constant strings but checks it for "normal" strings. You should ask on fpc-devel list for details. Ondrej -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
El 22/02/18 a les 13:46, Luca Olivetti via Lazarus ha escrit: El 22/02/18 a les 13:07, Tony Whyman via Lazarus ha escrit: You may find the "SetTextCodePage" procedure useful when it comes to setting the code page for a Windows console. e.g. SetTextCodePage(stdout,cp_utf8); Same result: garbage with writeln, correct result with w Unsurprising since it's already cp_utf8 (according to GetTextCodePage) Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
El 22/02/18 a les 13:07, Tony Whyman via Lazarus ha escrit: You may find the "SetTextCodePage" procedure useful when it comes to setting the code page for a Windows console. e.g. SetTextCodePage(stdout,cp_utf8); Same result: garbage with writeln, correct result with w See also https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html On 22/02/18 11:21, Luca Olivetti via Lazarus wrote: Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui application unchecked") or console application (using LazUTF8). File encoding utf-8 without bom. writeln('áéí') produces garbage, contrary to what's said in http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console using {$codepage utf8} (or utf-8 with bom) fixes it, but I'm not sure it's recommended to do so: http://wiki.freepascal.org/Unicode_Support_in_Lazarus#String_Literals The strange thing is that: procedure w(const s:string); begin writeln(s); end; w('áéí') also gives the correct output. Why? After all I'm using the same string literal. I suppose there's some underlying automatic conversion going on, but it's very confusing. Bye -- Luca Olivetti Wetron Automation Technology http://www.wetron.es/ Tel. +34 93 5883004 (Ext.3010) Fax +34 93 5883007 -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus
Re: [Lazarus] Garbage writing to console
You may find the "SetTextCodePage" procedure useful when it comes to setting the code page for a Windows console. e.g. SetTextCodePage(stdout,cp_utf8); See also https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html On 22/02/18 11:21, Luca Olivetti via Lazarus wrote: Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui application unchecked") or console application (using LazUTF8). File encoding utf-8 without bom. writeln('áéí') produces garbage, contrary to what's said in http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console using {$codepage utf8} (or utf-8 with bom) fixes it, but I'm not sure it's recommended to do so: http://wiki.freepascal.org/Unicode_Support_in_Lazarus#String_Literals The strange thing is that: procedure w(const s:string); begin writeln(s); end; w('áéí') also gives the correct output. Why? After all I'm using the same string literal. I suppose there's some underlying automatic conversion going on, but it's very confusing. Bye -- ___ Lazarus mailing list Lazarus@lists.lazarus-ide.org https://lists.lazarus-ide.org/listinfo/lazarus