Re: Windows vs UTF-8 (issue 15845)

2016-04-04 Thread Kagamin via Digitalmars-d

On Monday, 4 April 2016 at 00:06:28 UTC, ag0aep6g wrote:

Weird how bad the support for UTF-8 seems to be in Windows.


UTF-8 is a newer technology. As early adopters of unicode (before 
Unicode 3.0 standard), Windows, OSX and Java used UCS-2 and later 
migrated to UTF-16.


Re: Windows vs UTF-8 (issue 15845)

2016-04-04 Thread Kagamin via Digitalmars-d

On Sunday, 3 April 2016 at 22:07:07 UTC, Martin Krejcirik wrote:

On Sunday, 3 April 2016 at 21:55:39 UTC, ag0aep6g wrote:

Does this make sense to anyone?


Using UTF8 console via C api is broken in many ways on Windows. 
The problem is in C library. The only sensible way is to use 
Windows API. Related issues:


https://issues.dlang.org/show_bug.cgi?id=1448
https://issues.dlang.org/show_bug.cgi?id=15761


Last I checked Walter insisted that D I/O should be compatible 
with C I/O.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

Dne 4. 4. 2016 v 2:22 Adam D. Ruppe napsal(a):

On Monday, 4 April 2016 at 00:08:54 UTC, Martin Krejcirik wrote:

Probably not, it dont't work with pipes. Oh well ...


It is easy to detect that though and branch accordingly.


I think it not woth it. If Phobos just converted automatically from 
codepage to utf-8 for std streams, that would be enough. CP 65001 would 
still not work, but no one would notice.


--
mk


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Adam D. Ruppe via Digitalmars-d

On Monday, 4 April 2016 at 00:06:28 UTC, ag0aep6g wrote:

Weird how bad the support for UTF-8 seems to be in Windows.


Windows is more of a utf-16 system. It uses that internally, not 
utf-8, so conversions are often done anyway.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Adam D. Ruppe via Digitalmars-d

On Monday, 4 April 2016 at 00:08:54 UTC, Martin Krejcirik wrote:

Probably not, it dont't work with pipes. Oh well ...


It is easy to detect that though and branch accordingly.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

Dne 4. 4. 2016 v 2:03 Adam D. Ruppe napsal(a):

ReadConsoleW works fine though in all my attempts, we should prolly just
change the library to use it.


Probably not, it dont't work with pipes. Oh well ...

--
mk


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread ag0aep6g via Digitalmars-d

On Sunday, 3 April 2016 at 23:29:18 UTC, Martin Krejcirik wrote:
I think ReadConsole and WriteConsole API functions work with 
codepage 65001. (Sorry my previous reply went to your email).


Yeah, ReadConsole does work, somewhat. The data comes in as 
UTF-16, not UTF-8, though. And this time it only works when when 
stdin is a TTY (opposite of ReadFile).


So our reading functions would have to query _isatty and choose 
ReadFile or ReadConsole depending on the result. When using 
ReadConsole, they would also have to convert from UTF-16 to 
UTF-8. At that point it would probably make sense to detect other 
code pages as well and convert from those to UTF-8.


Weird how bad the support for UTF-8 seems to be in Windows.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Adam D. Ruppe via Digitalmars-d

On Sunday, 3 April 2016 at 23:11:53 UTC, anonymous wrote:

Doesn't seem to work for me.


Hmm, worked on my desktop but not my laptop... and I have no idea 
why now.


ReadConsoleW works fine though in all my attempts, we should 
prolly just change the library to use it.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

Dne 4. 4. 2016 v 0:52 ag0aep6g napsal(a):

reading UTF-8 is broken in Windows and there's no workaround, then issue
15845 can't be fixed, and we should stop telling people to use `chcp
65001` (and don't forget to change the font).


I think ReadConsole and WriteConsole API functions work with codepage 
65001. (Sorry my previous reply went to your email).


--
mk


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread anonymous via Digitalmars-d

On Sunday, 3 April 2016 at 22:48:21 UTC, Adam D. Ruppe wrote:
What happens if you give it a 4 char buffer? I imagine it would 
work fine then in all cases. It seems to for me.


Doesn't seem to work for me.

The exact code I tested:

import std.stdio;
import std.exception: enforce;
import core.sys.windows.windows;

void main()
{
SetConsoleCP(65001);
SetConsoleOutputCP(65001);

uint readBytes;
ubyte[4] c;
ReadFile(GetStdHandle(STD_INPUT_HANDLE), c.ptr, c.length,
, null).enforce();
writeln(readBytes, " ", c[]);
}


When I enter "a", it prints "3 [97, 13, 10, 0]".
When I enter "ä", it prints "0 [0, 0, 0, 0]".

I've also tried even larger buffers. Same result.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread ag0aep6g via Digitalmars-d

On 04.04.2016 00:31, Martin Krejcirik wrote:

Sorry, I missed that in your post. Anyway, after years of trying, I've
resorted to always converting to/from OEMCP.

You can use fromMBSz, toMBSz, GetConsoleCP, GetConsoleOutputCP functions
for that.


I'm not really asking for myself, but more for fixing issue 15845. If 
reading UTF-8 is broken in Windows and there's no workaround, then issue 
15845 can't be fixed, and we should stop telling people to use `chcp 
65001` (and don't forget to change the font).


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Adam D. Ruppe via Digitalmars-d

On Sunday, 3 April 2016 at 21:55:39 UTC, ag0aep6g wrote:
ReadFile(GetStdHandle(STD_INPUT_HANDLE), , 1, , 
null).enforce();


I'm not sure if this is it or not, but you are asking for only 
one byte here, but giving it a multibyte sequence.


What happens if you give it a 4 char buffer? I imagine it would 
work fine then in all cases. It seems to for me.



The docs say it returns when "A write operation completes on the 
write end of the pipe." That's probably what is happening here, 
and then it doesn't have enough room in your buffer to put the 
message, so it reads zero. I'm not sure why it wouldn't return an 
error though... and it seems to remove the whole message from the 
buffer anyway... but still, it kinda makes sense that it wouldn't 
give you the partial input since it needs to be translated as a 
whole unit.


Regardless though, giving it a bigger buffer should work in all 
cases and has other benefits too, so that's probably what you 
should do.


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

Dne 4. 4. 2016 v 0:11 ag0aep6g napsal(a):

On 04.04.2016 00:07, Martin Krejcirik wrote:
I'm under the impression that ReadFile is a Windows API function. Is
that not so? If it isn't, what is the corresponding Windows API function?


Sorry, I missed that in your post. Anyway, after years of trying, I've 
resorted to always converting to/from OEMCP.


You can use fromMBSz, toMBSz, GetConsoleCP, GetConsoleOutputCP functions 
for that.


--
mk


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

And convert to non-unicode codepage (OEMCP) ...


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread ag0aep6g via Digitalmars-d

On 04.04.2016 00:07, Martin Krejcirik wrote:

Using UTF8 console via C api is broken in many ways on Windows. The
problem is in C library. The only sensible way is to use Windows API.


I'm under the impression that ReadFile is a Windows API function. Is 
that not so? If it isn't, what is the corresponding Windows API function?


Re: Windows vs UTF-8 (issue 15845)

2016-04-03 Thread Martin Krejcirik via Digitalmars-d

On Sunday, 3 April 2016 at 21:55:39 UTC, ag0aep6g wrote:

Does this make sense to anyone?


Using UTF8 console via C api is broken in many ways on Windows. 
The problem is in C library. The only sensible way is to use 
Windows API. Related issues:


https://issues.dlang.org/show_bug.cgi?id=1448
https://issues.dlang.org/show_bug.cgi?id=15761





Windows vs UTF-8 (issue 15845)

2016-04-03 Thread ag0aep6g via Digitalmars-d
When trying to make sense of issue 15845 [1], I've found Windows 
behaving outright broken. I don't have a clue about Windows programming, 
though, so it's very possible that I'm just missing something. I'd hope so.


Code:

import std.stdio;
import std.exception: enforce;
import core.sys.windows.windows;

void main()
{
SetConsoleCP(65001);
SetConsoleOutputCP(65001);

uint readBytes;
ubyte c;
ReadFile(GetStdHandle(STD_INPUT_HANDLE), , 1, , 
null).enforce();

writeln(readBytes, " ", c);
}


This works for ASCII characters. It does not work for non-ASCII 
characters, e.g. 'ü'. ReadFile does not indicate an error, but it also 
doesn't read anything.


I can't find any explanation for this in the documentation for Readfile 
[2] or via Google. The same happens with -m32, -m64, fgetc, fgets. It 
also happens with equivalent C programs compiled with Visual Studio 2015.


I did find out that this apparently only happens when stdin is 
considered a TTY. According to _isatty [3], stdin is not a TTY when I 
use a pipe for input, e.g. `echo ä | test`, and then it works.


Does this make sense to anyone?


[1] https://issues.dlang.org/show_bug.cgi?id=15845
[2] https://msdn.microsoft.com/en-us/library/aa365467.aspx (ReadFile)
[3] https://msdn.microsoft.com/en-us/library/f4s0ddew.aspx (_isatty)