Re: [Lazarus] Garbage writing to console

2018-02-24 Thread R0b0t1 via Lazarus
On Sat, Feb 24, 2018 at 5:17 AM, Juha Manninen via Lazarus
 wrote:
> On Fri, Feb 23, 2018 at 11:15 AM, R0b0t1 via Lazarus
>  wrote:
>> Not to take the thread offtopic, ...
>
> You took it offtopic anyway.
> UTF-8 works just fine. Please read this page :
>  http://wiki.freepascal.org/Unicode_Support_in_Lazarus
> and start a new thread if you did not understand something.
>

It will not happen again, sir. I am a stain upon this mailing list.

My question was not about UTF-8 support in Lazarus. I know it works.
My question was about interoperability with Windows. Theoretically
there are some serious issues; I figured I would get a better response
asking someone using UTF-8 with Windows compared to a general poll.

Remorsefully,
 R0b0t1
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-24 Thread Juha Manninen via Lazarus
On Fri, Feb 23, 2018 at 11:15 AM, R0b0t1 via Lazarus
 wrote:
> Not to take the thread offtopic, ...

You took it offtopic anyway.
UTF-8 works just fine. Please read this page :
 http://wiki.freepascal.org/Unicode_Support_in_Lazarus
and start a new thread if you did not understand something.

Juha
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Mattias Gaertner via Lazarus
On Fri, 23 Feb 2018 11:18:55 +0100
Mattias Gaertner via Lazarus  wrote:

>[...]
> Write and Writeln are pure compiler magic. 
> Apparently if the string is CP_ACP, it is passed unchanged to Output,
> ignoring the DefaultSystemCodepage variable.

I correct myself. ;)
My example of write(s) showed that CP_ACP string is converted to console
codepage.
Passing a string literal to writeln works differently. It is passed
unchanged to Output, ignoring the DefaultSystemCodepage variable.

Mattias

-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Ondrej Pokorny via Lazarus

On 23.02.2018 11:18, Mattias Gaertner via Lazarus wrote:

Write and Writeln are pure compiler magic.
Apparently if the string is CP_ACP, it is passed unchanged to Output,
ignoring the DefaultSystemCodepage variable. Otherwise it converts the
string to the console codepage. This allows to write arbitrary bytes.
Maybe some compiler guru can confirm this.


Compiler gurus are reading fpc-devel :)

On 22.02.2018 22:26, Ondrej Pokorny via Lazarus wrote:
> It looks like writeln ignores the codepage for constant strings but 
checks it for "normal" strings. You should ask on fpc-devel list for 
details.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Mattias Gaertner via Lazarus
On Fri, 23 Feb 2018 10:56:11 +0100
Luca Olivetti via Lazarus  wrote:

> El 23/02/18 a les 09:23, Mattias Gaertner via Lazarus ha escrit:
> > On Fri, 23 Feb 2018 08:56:55 +0100
> > Ondrej Pokorny via Lazarus  wrote:
> >   
> >> On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote:  
> >>> But this is a Lazarus issue.  
> >>
> >> Why?  
> > 
> > Because the wiki page is unclear.
> > 
> > writeln('áéí'); // works only with source codepage UTF-8  
> 
> That's not enough, it has to be either utf8 with BOM or with {$codepage 
> utf8}
> (Which has its own set of problems)

"source codepage UTF-8" = "utf8 with BOM" = "{$codepage UTF8}" =
"-Fcutf8"


> > s:='áéí';
> > writeln(s); // works with and without  
> 
> Correct, but I don't understand why
> 
> > 
> > writeln(UTF8ToConsole('áéí')); // works with and without  
> 
> Yes, that works too.

Write and Writeln are pure compiler magic. 
Apparently if the string is CP_ACP, it is passed unchanged to Output,
ignoring the DefaultSystemCodepage variable. Otherwise it converts the
string to the console codepage. This allows to write arbitrary bytes.
Maybe some compiler guru can confirm this.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread R0b0t1 via Lazarus
On Fri, Feb 23, 2018 at 3:25 AM, Michael Van Canneyt via Lazarus
 wrote:
>
>
> On Fri, 23 Feb 2018, R0b0t1 via Lazarus wrote:
>
>> On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazarus
>>  wrote:
>>>
>>> OK, you mean it's a wiki page issue. I just didn't understand how we
>>> could
>>> solve it in Lazarus :)
>>>
>>
>> I am interested in this thread because I was under the impression that
>> UTF-8 support in Windows is fundamentally broken and should not be
>> used (it interferes with the C libraries).
>>
>> Not to take the thread offtopic, but can anyone comment on this in
>> practice?
>
>
> Where did you get that from ?
>
> You can perfectly use UTF8 in FPC code, but when calling a windows API, you
> should a) convert UTF8 to UTF16 (or WideString). If you use the correct
> types,
>the compiler will do it for you most of the time.
> b) Use the *W variant of a Windows system call.
>

The combination of those is the largest part of why
http://utf8everywhere.org/ and many independent developers recommend
avoiding Window's implementation of UTF-8. It doesn't end up doing you
any good, because you typically can not set the whole system to use
UTF-8 (because it is broken).

The brokenness (described at
https://social.msdn.microsoft.com/Forums/vstudio/en-US/e4b91f49-6f60-4ffe-887a-e18e39250905/possible-bugs-in-writefile-and-crt-unicode-issues?forum=vcgeneral)
is due to the UTF-8 codepage causing Windows to report multibyte
characters as a single character, and stdio assuming one byte per
character.

The Chinese/Japanese mappings apparently had these problems as well,
but a workaround was added.

Cheers,
 R0b0t1
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Luca Olivetti via Lazarus

El 23/02/18 a les 09:23, Mattias Gaertner via Lazarus ha escrit:

On Fri, 23 Feb 2018 08:56:55 +0100
Ondrej Pokorny via Lazarus  wrote:


On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote:

But this is a Lazarus issue.


Why?


Because the wiki page is unclear.

writeln('áéí'); // works only with source codepage UTF-8


That's not enough, it has to be either utf8 with BOM or with {$codepage 
utf8}

(Which has its own set of problems)



s:='áéí';
writeln(s); // works with and without


Correct, but I don't understand why



writeln(UTF8ToConsole('áéí')); // works with and without


Yes, that works too.

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Michael Van Canneyt via Lazarus



On Fri, 23 Feb 2018, R0b0t1 via Lazarus wrote:


On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazarus
 wrote:

OK, you mean it's a wiki page issue. I just didn't understand how we could
solve it in Lazarus :)



I am interested in this thread because I was under the impression that
UTF-8 support in Windows is fundamentally broken and should not be
used (it interferes with the C libraries).

Not to take the thread offtopic, but can anyone comment on this in practice?


Where did you get that from ?

You can perfectly use UTF8 in FPC code, but when calling a windows API, you
should 
a) convert UTF8 to UTF16 (or WideString). If you use the correct types,

   the compiler will do it for you most of the time.
b) Use the *W variant of a Windows system call.


Michael.
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread R0b0t1 via Lazarus
On Fri, Feb 23, 2018 at 2:29 AM, Ondrej Pokorny via Lazarus
 wrote:
> OK, you mean it's a wiki page issue. I just didn't understand how we could
> solve it in Lazarus :)
>

I am interested in this thread because I was under the impression that
UTF-8 support in Windows is fundamentally broken and should not be
used (it interferes with the C libraries).

Not to take the thread offtopic, but can anyone comment on this in practice?

Cheers,
 R0b0t1
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Ondrej Pokorny via Lazarus

On 23.02.2018 9:23, Mattias Gaertner via Lazarus wrote:

On Fri, 23 Feb 2018 08:56:55 +0100
Ondrej Pokorny via Lazarus  wrote:


On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote:

But this is a Lazarus issue.

Why?

Because the wiki page is unclear.

writeln('áéí'); // works only with source codepage UTF-8

s:='áéí';
writeln(s); // works with and without

writeln(UTF8ToConsole('áéí')); // works with and without


OK, you mean it's a wiki page issue. I just didn't understand how we 
could solve it in Lazarus :)


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Mattias Gaertner via Lazarus
On Fri, 23 Feb 2018 08:56:55 +0100
Ondrej Pokorny via Lazarus  wrote:

> On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote:
> > But this is a Lazarus issue.  
> 
> Why?

Because the wiki page is unclear.

writeln('áéí'); // works only with source codepage UTF-8

s:='áéí';
writeln(s); // works with and without

writeln(UTF8ToConsole('áéí')); // works with and without


Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-23 Thread Ondrej Pokorny via Lazarus

On 23.02.2018 8:46, Mattias Gaertner via Lazarus wrote:

But this is a Lazarus issue.


Why?

Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Mattias Gaertner via Lazarus
On Fri, 23 Feb 2018 08:44:23 +0100
Ondrej Pokorny via Lazarus  wrote:

> OK, sorry. I wrote bullshit. But still, you should ask on fpc-devel list 
> regarding compiler issues - the right people are there.

fpc-devel is about development of the compiler.
fpc-pascal is for normal user questions.

But this is a Lazarus issue.

Mattias
-- 
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Ondrej Pokorny via Lazarus
OK, sorry. I wrote bullshit. But still, you should ask on fpc-devel list 
regarding compiler issues - the right people are there.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Luca Olivetti via Lazarus

El 22/02/18 a les 22:26, Ondrej Pokorny via Lazarus ha escrit:

On 22.02.2018 12:21, Luca Olivetti via Lazarus wrote:
Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui 
application unchecked") or console application (using LazUTF8).

File encoding utf-8 without bom.

writeln('áéí')

produces garbage, contrary to what's said in 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console


No, it's not contrary - just the opposite. The problem is that áéí (ANSI 
1250 or whatever) have different codes to your console codepage (CP437 
or whatever). You should open/edit your program source code in your 
console codepage - you obviously edit it in ANSI.


Uh? It's utf-8


"When you convert the code to UTF-8, for example by using Lazarus' 
source editor popup menu item File Settings / Encoding / UTF-8 and 
clicking on the dialog button "Change file on disk", the ╩ becomes 3 
bytes (#226#149#169), so the literal becomes a string. *The procedures 
write and writeln convert the UTF-8 string to the current console 
codepage*. So your console program now outputs the '╩' on Windows with 
any codepage (i.e. not only CP437) and it even works on Linux and Mac OS X."


Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Ondrej Pokorny via Lazarus

On 22.02.2018 12:21, Luca Olivetti via Lazarus wrote:
Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui 
application unchecked") or console application (using LazUTF8).

File encoding utf-8 without bom.

writeln('áéí')

produces garbage, contrary to what's said in 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console


No, it's not contrary - just the opposite. The problem is that áéí (ANSI 
1250 or whatever) have different codes to your console codepage (CP437 
or whatever). You should open/edit your program source code in your 
console codepage - you obviously edit it in ANSI.



The strange thing is that:

procedure w(const s:string);
begin
  writeln(s);
end;

w('áéí')

also gives the correct output.


It looks like writeln ignores the codepage for constant strings but 
checks it for "normal" strings. You should ask on fpc-devel list for 
details.


Ondrej
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Luca Olivetti via Lazarus

El 22/02/18 a les 13:46, Luca Olivetti via Lazarus ha escrit:

El 22/02/18 a les 13:07, Tony Whyman via Lazarus ha escrit:
You may find the "SetTextCodePage" procedure useful when it comes to 
setting the code page for a Windows console.


e.g. SetTextCodePage(stdout,cp_utf8);


Same result: garbage with writeln, correct result with w


Unsurprising since it's already cp_utf8 (according to GetTextCodePage)

Bye
--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Luca Olivetti via Lazarus

El 22/02/18 a les 13:07, Tony Whyman via Lazarus ha escrit:
You may find the "SetTextCodePage" procedure useful when it comes to 
setting the code page for a Windows console.


e.g. SetTextCodePage(stdout,cp_utf8);


Same result: garbage with writeln, correct result with w




See also 
https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html



On 22/02/18 11:21, Luca Olivetti via Lazarus wrote:
Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui 
application unchecked") or console application (using LazUTF8).

File encoding utf-8 without bom.

writeln('áéí')

produces garbage, contrary to what's said in 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console


using {$codepage utf8} (or utf-8 with bom) fixes it, but I'm not sure 
it's recommended to do so: 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#String_Literals


The strange thing is that:

procedure w(const s:string);
begin
  writeln(s);
end;

w('áéí')


also gives the correct output.
Why? After all I'm using the same string literal.
I suppose there's some underlying automatic conversion going on, but 
it's very confusing.


Bye





--
Luca Olivetti
Wetron Automation Technology http://www.wetron.es/
Tel. +34 93 5883004 (Ext.3010)  Fax +34 93 5883007
--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus


Re: [Lazarus] Garbage writing to console

2018-02-22 Thread Tony Whyman via Lazarus
You may find the "SetTextCodePage" procedure useful when it comes to 
setting the code page for a Windows console.


e.g. SetTextCodePage(stdout,cp_utf8);

See also 
https://www.freepascal.org/docs-html/rtl/system/settextcodepage.html



On 22/02/18 11:21, Luca Olivetti via Lazarus wrote:
Lazarus 1.8.0, fpc 3.0.4, windows application (with "win32 gui 
application unchecked") or console application (using LazUTF8).

File encoding utf-8 without bom.

writeln('áéí')

produces garbage, contrary to what's said in 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#Writing_to_console


using {$codepage utf8} (or utf-8 with bom) fixes it, but I'm not sure 
it's recommended to do so: 
http://wiki.freepascal.org/Unicode_Support_in_Lazarus#String_Literals


The strange thing is that:

procedure w(const s:string);
begin
  writeln(s);
end;

w('áéí')


also gives the correct output.
Why? After all I'm using the same string literal.
I suppose there's some underlying automatic conversion going on, but 
it's very confusing.


Bye


--
___
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus