Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/23/19 2:52 PM, Symphony wrote:

Pardon my ignorance, but wouldn't the inclusion of a std.io (e.g. Martin 
Nowak's io library) into Phobos be an easier and cleaner move? Other 
Phobos modules that require std.stdio could be gradually changed so that 
they use std.io instead.


Well, that's certainly a lot easier project. But one might question 
whether we should do it unless we have a reason to have Phobos start 
using it. As bachmeier mentioned, it can happily exist in its own location.


The "gradual change" thing, I don't know how that works.

Also note that std.io has no buffering. You need something like iopipe 
on top of it for it to be reasonably usable.


There would be the issue of two coexisting IO 
libraries in std, but issuing some warnings whenever std.stdio is 
imported wouldn't be too bad in my view; that is unless Mr. Bright's 
opposition is the main blocker.


It's not without precedent though. There actually was an alternate 
stream system in Phobos, now in undead: 
https://github.com/dlang/undeaD/blob/master/src/undead/stream.d


But I think before we think about making the attempt to get this 
accepted, we really need to flesh out the end goal. The maintainers have 
soured a bit I think on the std.experiemental location, especially since 
we do have code.dlang.org. The bar for entry is high for Phobos.


My recommendation is to focus on getting the std.io project and the 
iopipe project to be usable and fully featured. Then it may be a much 
easier task to convince leadership that they should be in Phobos.


-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Symphony via Digitalmars-d-learn
On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer 
wrote:
I really appreciate the enthusiasm here, but at the risk of 
being cynical, I see little chance that this gets accepted. 
Before you spend any time on actual code, a DIP is going to be 
required, as this would be a huge change to the language. I'm 
sure you have a lot of time, but I don't want you to waste it 
on something that is likely to be rejected.


If you still want to proceed, even at the risk of doing a lot 
of work for nothing (or at least, a lot of work that ends up 
being just on code.dlang.org instead of Phobos), I can tell you 
what my plan was:


1. std.stdio.File was going to be set up to source from either 
an iopipe-based io subsystem, or a FILE *.


2. The standard handles would be open with the default C FILE * 
standard handles as the source/target.


3. Upon using any "d-like" features on a File that is sourced 
from a FILE * (i.e. byline), the File would be switched to a 
newly-created iopipe-based source. The theory is here, that 
once you do something like this, you commit to using D on that, 
and I'd much rather use a higher performing subsystem (iopipe 
beats Phobos right now by 2x performance). This only counts for 
things that make the File unusable on its own anyway. So 
writefln and writeln would NOT switch the source, neither would 
lockingTextReader/Writer.


4. Any new File that is opened using any constructor other than 
passing in a FILE * will be opened with an iopipe source.


5. The iopipe and io subsystems can be used directly instead of 
with File, as a lot of times you don't need that overhead.


Let me know if you decide to do this, I can guide you.

-Steve
Pardon my ignorance, but wouldn't the inclusion of a std.io (e.g. 
Martin Nowak's io library) into Phobos be an easier and cleaner 
move? Other Phobos modules that require std.stdio could be 
gradually changed so that they use std.io instead. There would be 
the issue of two coexisting IO libraries in std, but issuing some 
warnings whenever std.stdio is imported wouldn't be too bad in my 
view; that is unless Mr. Bright's opposition is the main blocker.


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/23/19 11:02 AM, Adam D. Ruppe wrote:

On Monday, 23 December 2019 at 15:41:33 UTC, Steven Schveighoffer wrote:
That means we have to buffer separately, which means we have a problem 
interleaving printf with writef. It would be awful.


Or simply don't buffer. Any call you get, flush the C buffer and write 
the D stuff immediately.


Unbuffered output would perform badly, especially if you are writing 
characters at a time (which is what formattedWrite does). But I think 
this would solve the interleaving problem.


Remember, this code branch is only called if we already know it is an 
interactive console. They're usually flushed frequently (at least at 
every line) anyway... so especially with writeln / writefln those are 
virtually guaranteed and certainly expected to flush at the end anyway. 
I really don't think any performance concern would be significant.


Honestly, I think it sounds horrible to have yet another special case 
for this specific situation. But also, I almost never use Windows for D 
work, so I'm fine if you want to duct tape some more cruft onto that 
branch. std.stdio is already a pretty big mess.


-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread H. S. Teoh via Digitalmars-d-learn
On Mon, Dec 23, 2019 at 10:41:33AM -0500, Steven Schveighoffer via 
Digitalmars-d-learn wrote:
[...]
> There's this guy, his name is Walter. He likes printf. I'm pretty sure
> when he's buried, his cold dead fingers will be tightly and
> inextricably wrapped around printf.
[...]

But that's not a problem; since he loves printf so much, he'd never use
std.stdio.write* in the first place.  No conflict there. :-D


T

-- 
INTEL = Only half of "intelligence".


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Adam D. Ruppe via Digitalmars-d-learn
On Monday, 23 December 2019 at 15:41:33 UTC, Steven Schveighoffer 
wrote:
That means we have to buffer separately, which means we have a 
problem interleaving printf with writef. It would be awful.


Or simply don't buffer. Any call you get, flush the C buffer and 
write the D stuff immediately.


Remember, this code branch is only called if we already know it 
is an interactive console. They're usually flushed frequently (at 
least at every line) anyway... so especially with writeln / 
writefln those are virtually guaranteed and certainly expected to 
flush at the end anyway. I really don't think any performance 
concern would be significant.


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/23/19 10:48 AM, bachmeier wrote:

On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer wrote:

I really appreciate the enthusiasm here, but at the risk of being 
cynical, I see little chance that this gets accepted. Before you spend 
any time on actual code, a DIP is going to be required, as this would 
be a huge change to the language. I'm sure you have a lot of time, but 
I don't want you to waste it on something that is likely to be rejected.


If you still want to proceed, even at the risk of doing a lot of work 
for nothing (or at least, a lot of work that ends up being just on 
code.dlang.org instead of Phobos)


Just out of curiosity, what would be the advantage of having something 
like this in Phobos rather than as a separate package?


It means that all of Phobos can take advantage of the better performance 
and other benefits.


For instance, std.process uses File (and therefore FILE *) as it's 
streams for the pipes to the child process. This has huge limitations.


-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread bachmeier via Digitalmars-d-learn
On Monday, 23 December 2019 at 15:34:13 UTC, Steven Schveighoffer 
wrote:


I really appreciate the enthusiasm here, but at the risk of 
being cynical, I see little chance that this gets accepted. 
Before you spend any time on actual code, a DIP is going to be 
required, as this would be a huge change to the language. I'm 
sure you have a lot of time, but I don't want you to waste it 
on something that is likely to be rejected.


If you still want to proceed, even at the risk of doing a lot 
of work for nothing (or at least, a lot of work that ends up 
being just on code.dlang.org instead of Phobos)


Just out of curiosity, what would be the advantage of having 
something like this in Phobos rather than as a separate package?


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/23/19 10:25 AM, H. S. Teoh wrote:

On Sun, Dec 22, 2019 at 10:04:20PM +, Adam D. Ruppe via Digitalmars-d-learn 
wrote:
[...]

Regardless, I'm pretty well of the opinion that fwrite is the wrong
thing to do anyway. fwrite writes bytes to a file, but we want to
write strings to the console. There's other functions that do that.

[...]

Would it make sense for std.stdio.write* (the package global functions,
as opposed to File.write*) to use the Windows console output functions
instead of proxying to libc?


That means we have to buffer separately, which means we have a problem 
interleaving printf with writef. It would be awful.



Alternatively, we could change std.stdio.File to check if the current
file descriptor is the console (fd == stdout && stdout == console,
however you figure that out in Windows), and silently switch to the
Windows console output functions instead of libc.  We *are* already
wrapping libc's FILE*, why not wrap the Windows console output functions
as well.


Again, the docs say you have to use wprintf, not fwrite. We would have 
to switch to using wprintf, and I'm not sure it's very easy thing to do. 
It might be possible though.




Mixing raw libc printf with std.stdio.write* is a bad idea anyway; do we
really need to support that??  Though calling fflush(stdout) may not be
amiss, just to alleviate sudden breakage and ensuing complaints.


There's this guy, his name is Walter. He likes printf. I'm pretty sure 
when he's buried, his cold dead fingers will be tightly and inextricably 
wrapped around printf.



And of course, this only applies to Windows. On Posix libc is pretty
much still the standard way of working with console output.


The source of this thread is for valid unicode to come out on the 
screen, which I'm pretty sure Posix systems support just fine.


Other than that, there are good reasons NOT to use libc, but this is 
disruptive and difficult to get right as a "drop in"


-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/22/19 11:53 PM, Symphony wrote:

On Sunday, 22 December 2019 at 22:47:43 UTC, Steven Schveighoffer wrote:
To fix Phobos, we just(!) need to remove libc as the underlying stream 
implementation.


I had at one point agreement from Walter to make a 
"backwards-compatible-ish" mechanism for file/streams. But it's not 
pretty, and was convoluted. At the time, I was struggling getting what 
would become iopipe to be usable on its own, and I eventually quit 
worrying about that aspect of it.


We have the basic building blocks with 
https://github.com/MartinNowak/io and 
https://github.com/schveiguy/iopipe. It would be cool to get this into 
Phobos, but it's a lot of work.


I bet Rust just skips libc altogether.

I don't have the ingenuity, intelligence, nor experience that many of 
you possess, but I have *a lot* of time on my hands for something like 
this. I assume I should start with std.stdio's source code and the 
aforementioned projects' source code, but some guidance on this would be 
very helpful, if not needed. D has been quite useful to me since I 
stumbled upon it, and I think it's time to give back in some way. (I'd 
do it financially, but I'm poor, haha) Anyway, if anybody wants to take 
me up on this offer, just let me know!


I really appreciate the enthusiasm here, but at the risk of being 
cynical, I see little chance that this gets accepted. Before you spend 
any time on actual code, a DIP is going to be required, as this would be 
a huge change to the language. I'm sure you have a lot of time, but I 
don't want you to waste it on something that is likely to be rejected.


If you still want to proceed, even at the risk of doing a lot of work 
for nothing (or at least, a lot of work that ends up being just on 
code.dlang.org instead of Phobos), I can tell you what my plan was:


1. std.stdio.File was going to be set up to source from either an 
iopipe-based io subsystem, or a FILE *.


2. The standard handles would be open with the default C FILE * standard 
handles as the source/target.


3. Upon using any "d-like" features on a File that is sourced from a 
FILE * (i.e. byline), the File would be switched to a newly-created 
iopipe-based source. The theory is here, that once you do something like 
this, you commit to using D on that, and I'd much rather use a higher 
performing subsystem (iopipe beats Phobos right now by 2x performance). 
This only counts for things that make the File unusable on its own 
anyway. So writefln and writeln would NOT switch the source, neither 
would lockingTextReader/Writer.


4. Any new File that is opened using any constructor other than passing 
in a FILE * will be opened with an iopipe source.


5. The iopipe and io subsystems can be used directly instead of with 
File, as a lot of times you don't need that overhead.


Let me know if you decide to do this, I can guide you.

-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-23 Thread H. S. Teoh via Digitalmars-d-learn
On Sun, Dec 22, 2019 at 10:04:20PM +, Adam D. Ruppe via Digitalmars-d-learn 
wrote:
[...]
> Regardless, I'm pretty well of the opinion that fwrite is the wrong
> thing to do anyway. fwrite writes bytes to a file, but we want to
> write strings to the console. There's other functions that do that.
[...]

Would it make sense for std.stdio.write* (the package global functions,
as opposed to File.write*) to use the Windows console output functions
instead of proxying to libc?

Alternatively, we could change std.stdio.File to check if the current
file descriptor is the console (fd == stdout && stdout == console,
however you figure that out in Windows), and silently switch to the
Windows console output functions instead of libc.  We *are* already
wrapping libc's FILE*, why not wrap the Windows console output functions
as well.

Mixing raw libc printf with std.stdio.write* is a bad idea anyway; do we
really need to support that??  Though calling fflush(stdout) may not be
amiss, just to alleviate sudden breakage and ensuing complaints.

And of course, this only applies to Windows. On Posix libc is pretty
much still the standard way of working with console output.


T

-- 
VI = Visual Irritation


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Symphony via Digitalmars-d-learn
On Sunday, 22 December 2019 at 22:47:43 UTC, Steven Schveighoffer 
wrote:
To fix Phobos, we just(!) need to remove libc as the underlying 
stream implementation.


I had at one point agreement from Walter to make a 
"backwards-compatible-ish" mechanism for file/streams. But it's 
not pretty, and was convoluted. At the time, I was struggling 
getting what would become iopipe to be usable on its own, and I 
eventually quit worrying about that aspect of it.


We have the basic building blocks with 
https://github.com/MartinNowak/io and 
https://github.com/schveiguy/iopipe. It would be cool to get 
this into Phobos, but it's a lot of work.


I bet Rust just skips libc altogether.

-Steve
I don't have the ingenuity, intelligence, nor experience that 
many of you possess, but I have *a lot* of time on my hands for 
something like this. I assume I should start with std.stdio's 
source code and the aforementioned projects' source code, but 
some guidance on this would be very helpful, if not needed. D has 
been quite useful to me since I stumbled upon it, and I think 
it's time to give back in some way. (I'd do it financially, but 
I'm poor, haha) Anyway, if anybody wants to take me up on this 
offer, just let me know!


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/22/19 5:04 PM, Adam D. Ruppe wrote:

On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer wrote:
Phobos doesn't call the wrong function, libc does. Phobos uses fwrite 
for output.


There is allegedly a way to set fwrite to do the translations on MSVCRT:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=vs-2019 


Looks like you need to switch to "wprintf". I'm not sure, but I think we 
rely only on fwrite, for which there is no "w" equivalent.



but trying it here it throws invalid parameter exception so idk.


Not surprised ;)

Here's a cool feature of Windows:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/fwide?view=vs-2019

Basically does nothing, all parameters ignored (and yes, we use this 
function in Phobos, assuming it does something).


But let me just say, the fact that there is some "mode" you have to set, 
like binary mode, that makes unicode work is unsettling. I hate libc 
streams...




Regardless, I'm pretty well of the opinion that fwrite is the wrong 
thing to do anyway. fwrite writes bytes to a file, but we want to write 
strings to the console. There's other functions that do that.


Preaching to the choir here. I wanted to rip out libc reliance a decade ago.

There is the worry of mixing stuff from C and keeping the buffer 
consistent, but it could always just flush() before doing its thing too. 
Or maybe even merge the buffers, idk what the MS runtime supports for that.


This is the crux. Some people gotta have their printf. And if you do 
different types of buffered streams, the result even from 
single-threaded output looks like garbage. The only solution is to wrap 
FILE *. And I do mean only. I looked into trying to hook the buffers. 
There's no reliable way without knowing all the implementation details.



or maybe i'm missing something and _setmode is a viable solution.


_setmode is on a file descriptor. That already is a red flag to me, as 
there are no file descriptors in the OS. Windows use handles. So this 
has some weird library "translation" happening underneath. Ugh.


But whatever we do, passing the buck isn't solving anything. Windows has 
supported Unicode console output since NT 4.0 in 1996.. just have to 
call the right function, and whether it is Phobos calling it or druntime 
or the CRT, someone just needs to do it!


Hey, you can always just call the function yourself! Just make an output 
stream that writes with the right function, and then you can use 
formattedWrite instead of writef.


To fix Phobos, we just(!) need to remove libc as the underlying stream 
implementation.


I had at one point agreement from Walter to make a 
"backwards-compatible-ish" mechanism for file/streams. But it's not 
pretty, and was convoluted. At the time, I was struggling getting what 
would become iopipe to be usable on its own, and I eventually quit 
worrying about that aspect of it.


We have the basic building blocks with https://github.com/MartinNowak/io 
and https://github.com/schveiguy/iopipe. It would be cool to get this 
into Phobos, but it's a lot of work.


I bet Rust just skips libc altogether.

-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Adam D. Ruppe via Digitalmars-d-learn
On Sunday, 22 December 2019 at 18:41:16 UTC, Steven Schveighoffer 
wrote:
Phobos doesn't call the wrong function, libc does. Phobos uses 
fwrite for output.


There is allegedly a way to set fwrite to do the translations on 
MSVCRT:

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=vs-2019

but trying it here it throws invalid parameter exception so idk.

Regardless, I'm pretty well of the opinion that fwrite is the 
wrong thing to do anyway. fwrite writes bytes to a file, but we 
want to write strings to the console. There's other functions 
that do that.


There is the worry of mixing stuff from C and keeping the buffer 
consistent, but it could always just flush() before doing its 
thing too. Or maybe even merge the buffers, idk what the MS 
runtime supports for that.


or maybe i'm missing something and _setmode is a viable solution.


But whatever we do, passing the buck isn't solving anything. 
Windows has supported Unicode console output since NT 4.0 in 
1996.. just have to call the right function, and whether it is 
Phobos calling it or druntime or the CRT, someone just needs to 
do it!


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Steven Schveighoffer via Digitalmars-d-learn

On 12/22/19 8:40 AM, Adam D. Ruppe wrote:

On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole wrote:

Not a bug.


No, Phobos is *clearly* in the wrong here. There is a proper fix.


Phobos doesn't call the wrong function, libc does. Phobos uses fwrite 
for output.



http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode


You need to address that in DMC. I wonder, does MSVCRT have the same 
problem?


-Steve


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Adam D. Ruppe via Digitalmars-d-learn
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole 
wrote:

Not a bug.


No, Phobos is *clearly* in the wrong here. There is a proper fix.

http://dpldocs.info/this-week-in-d/Blog.Posted_2019_11_25.html#unicode

Use the correct WriteConsoleW api instead of the ancient ascii 
api. WriteConsoleW works without changing any settings. (on old 
versions of Windows, you may have to install fonts to display it, 
but new ones come with it all preinstalled).


Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Adam D. Ruppe via Digitalmars-d-learn

On Sunday, 22 December 2019 at 06:11:13 UTC, moth wrote:
is there any function i can call or setting i can adjust to get 
D to do the same, or do i have to wait for something to be 
fixed in the language / compiler itself?


It isn't the language/compiler per se, it is the library calling 
the wrong function. See the code in the link in my last email - 
if you call the Windows WriteConsoleW function directly it will 
do what you want. The rest of the surrounding code in the link is 
to handle conversions and pipes to files.




Re: unicode characters are not printed correctly on the windows command line?

2019-12-22 Thread Mike Parker via Digitalmars-d-learn
On Sunday, 22 December 2019 at 06:25:42 UTC, rikki cattermole 
wrote:

On 22/12/2019 7:11 PM, moth wrote:





is there any function i can call or setting i can adjust to 
get D to do the same, or do i have to wait for something to be 
fixed in the language / compiler itself?




Not a bug. This is a known issue on the Windows side for people 
new to developing natively for it.


Yes, and it's not just D programs. And setting the code page 
isn't always perfect, as it matters which font cmd is configured 
to use. Google for "windows command prompt unicode output".


MS has updated the command prompt to support Unicode, but I don't 
know how to use it:


https://devblogs.microsoft.com/commandline/windows-command-line-unicode-and-utf-8-output-text-buffer/

If you're on Windows 10, there's also Windows Terminal, which was 
released on the app store in June:


https://devblogs.microsoft.com/commandline/windows-terminal-preview-v0-7-release/


Re: unicode characters are not printed correctly on the windows command line?

2019-12-21 Thread rikki cattermole via Digitalmars-d-learn

On 22/12/2019 7:11 PM, moth wrote:

hi all.

been learning d for the last few years but suddenly realised...

when i use this code:

writeln('♥');

the output displayed on the windows command line is "ÔÖÑ" [it works fine 
when piped directly into a text file, however].


i've looked about in this forum, but all that i could find was people in 
2016[!] saying the codepage had to be altered - clearly nonsense, since 
Rust [which i am also learning] has no problem whatsoever displaying "♥".


This is not nonsense. This is the correct solution if that is what you 
intend for your program to do.


Not everybody will want this. They may have set the code page themselves 
in some way. It may not have even occurred within a D application!


Its best we leave it as the default to play nice with other applications 
and libraries.


is there any function i can call or setting i can adjust to get D to do 
the same, or do i have to wait for something to be fixed in the language 
/ compiler itself?


best regards

moth [su.angel-island.zone]



Not a bug. This is a known issue on the Windows side for people new to 
developing natively for it.


I just checked the terminal emulator I use, ConEmu and yeah it doesn't 
have to do anything to make Unicode "just work" settings wise. Its 
conhost with its legacy which is what you are facing.


unicode characters are not printed correctly on the windows command line?

2019-12-21 Thread moth via Digitalmars-d-learn

hi all.

been learning d for the last few years but suddenly realised...

when i use this code:

writeln('♥');

the output displayed on the windows command line is "ÔÖÑ" [it 
works fine when piped directly into a text file, however].


i've looked about in this forum, but all that i could find was 
people in 2016[!] saying the codepage had to be altered - clearly 
nonsense, since Rust [which i am also learning] has no problem 
whatsoever displaying "♥".


is there any function i can call or setting i can adjust to get D 
to do the same, or do i have to wait for something to be fixed in 
the language / compiler itself?


best regards

moth [su.angel-island.zone]