[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 Martin Krejcirikchanged: What|Removed |Added CC||m...@krej.cz See Also||https://issues.dlang.org/sh ||ow_bug.cgi?id=15761 --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 anonymous4changed: What|Removed |Added See Also||https://issues.dlang.org/sh ||ow_bug.cgi?id=7084 --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 --- Comment #15 from Walter Bright--- When I start a command prompt in Windows, I run the command: chcp 65001 which sets it to Unicode. --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 Vladimir Panteleevchanged: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |WONTFIX --- Comment #14 from Vladimir Panteleev --- I think we should let this one go. 1. To see international characters in the first place, you have to change the console font from a raster one. 2. Setting the output console CP to 65001 is not an option because it breaks spawned programs. In particular, batch files stop working. Problems also occur if the console isn't changed back. 3. Changing the data's output encoding according to the user's locale cannot be done if the output is a file or pipe, as it would be a breaking change. 4. As a result, the only way to do this is to check if the output is the console. However, because we do output via the C standard library, whatever stdout points to may change at any moment, so we cannot cache the check. 5. Since all output is done via the C standard library, it is its responsibility to handle this, however it does not. We do not have control over the MS standard C library, which does not implement this check. I think this is unactionable unless either we move away from using C for input/output (see: std.io), or someone presents a C example program that produces correct Unicode output to both console and file and which works with all C runtimes that D uses (AFAIU, this is impossible). > If this is not going to be fixed, it should be documented. The problem is with Windows and the C libraries, not D. --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 Andrei Alexandrescuchanged: What|Removed |Added Keywords||bootcamp Assignee|and...@erdani.com |nob...@puremagic.com --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 Andrei Alexandrescu and...@erdani.com changed: What|Removed |Added Version|2.025 |D2 --
[Issue 2742] std.stdio assumes console works in utf-8
https://issues.dlang.org/show_bug.cgi?id=2742 Vladimir Panteleev thecybersha...@gmail.com changed: What|Removed |Added See Also||https://issues.dlang.org/sh ||ow_bug.cgi?id=1448 --
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 Walter Bright bugzi...@digitalmars.com changed: What|Removed |Added Keywords|spec| CC||bugzi...@digitalmars.com --- Comment #13 from Walter Bright bugzi...@digitalmars.com 2012-01-23 00:29:35 PST --- Not a language spec issue. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #10 from Stewart Gordon s...@iname.com 2011-05-25 04:59:10 PDT --- (In reply to comment #9) According to this page http://codesnippets.joyent.com/posts/show/414 you can get and set the codepage via the [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP value. Setting the codepage requires a restart though. Not if you do it using chcp on the command line, or (presumably) SetConsoleCP in the Windows API. Also, changing the codepage has other effects, e.g. using ALT+Numpad keys is handled differently (with codepage 1252 you don't have to prepend a zero when using ALT+Numkey apparently). snip I don't have to prepend a zero anyway. It just produces a different character if I do. Traditionally at least, with a 0 it types a character from the ANSI set, and without a 0 it types a character from the OEM set. But as I test it (Win7), it depends on what font the command prompt is set to. - Lucida Console or Consolas - C:\Users\StewartGordonchcp 850 Active code page: 850 C:\Users\StewartGordon£úœ£ '£úœ£' is not recognized as an internal or external command, operable program or batch file. C:\Users\StewartGordonchcp 1252 Active code page: 1252 C:\Users\StewartGordon£úœ£ - Raster Fonts - C:\Users\StewartGordonchcp 850 Active code page: 850 C:\Users\StewartGordon£úo£ '£úo£' is not recognized as an internal or external command, operable program or batch file. C:\Users\StewartGordonchcp 1252 Active code page: 1252 C:\Users\StewartGordonú·£ú -- The sequence of strange characters is Alt+0163, Alt+163, Alt+0156, Alt+156 in each case. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 Vladimir Panteleev thecybersha...@gmail.com changed: What|Removed |Added CC||thecybersha...@gmail.com --- Comment #11 from Vladimir Panteleev thecybersha...@gmail.com 2011-05-25 05:02:20 PDT --- Since no one seems to have mentioned this here yet: http://msdn.microsoft.com/en-us/library/ms686036(v=vs.85).aspx -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #9 from Andrej Mitrovic andrej.mitrov...@gmail.com 2011-05-24 20:01:07 PDT --- According to this page http://codesnippets.joyent.com/posts/show/414 you can get and set the codepage via the [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage] key's OEMCP value. Setting the codepage requires a restart though. Also, changing the codepage has other effects, e.g. using ALT+Numpad keys is handled differently (with codepage 1252 you don't have to prepend a zero when using ALT+Numkey apparently). Here's how to fetch the value: import std.stdio; import std.windows.registry; void main() { Key HKLM = Registry.localMachine; Key SFW = HKLM.getKey(rSYSTEM\CurrentControlSet\Control\Nls\CodePage); auto codePage = SFW.getValue(OEMCP).value_SZ(); writeln(codePage); } Note that the key type is REG_SZ, a string, not a binary value. So if you want to set the code page programmatically, you have to call: SFW.setValue(OEMCP, 1252); One more thing, there was this comment: Change the code page in your registry and you may not be able to reboot your windows anymore. That sounds kind of scary. Perhaps all of this should be left to the user to do and just document it somewhere in the docs. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #6 from Sobirari Muhomori dfj1es...@sneakemail.com 2010-09-29 10:58:53 PDT --- http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.Darticle_id=114211 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #7 from Sobirari Muhomori dfj1es...@sneakemail.com 2010-09-29 11:02:43 PDT --- This can be a good test for dchar[]-looking ranges. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #8 from Sobirari Muhomori dfj1es...@sneakemail.com 2010-09-29 11:39:42 PDT --- Looking at std.stdio, an easy fix will be to make sure all IO goes through File.write, which calls LockingTextWriter.put, which actually tries to do the correct transcoding. You just need to have target codepage in File, and use it in LockingTextWriter.put. The first thing is to statically import core.stdc.stdio to minimize and control its usage. Though a nice design will be correctly implemented .net-way Streams/TextStreams, whatever you want them to work like in D. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #4 from Andrei Alexandrescu and...@metalanguage.com 2010-09-26 14:24:54 PDT --- Any fresh ideas on how to fix this? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 Andrei Alexandrescu and...@metalanguage.com changed: What|Removed |Added Status|NEW |ASSIGNED CC||and...@metalanguage.com AssignedTo|nob...@puremagic.com|and...@metalanguage.com -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email --- You are receiving this mail because: ---
[Issue 2742] std.stdio assumes console works in utf-8
http://d.puremagic.com/issues/show_bug.cgi?id=2742 --- Comment #2 from ma...@pochta.ru 2009-03-30 06:17 --- But how many DOS or Windows console apps in the real world output UTF-8? Presumably not many, considering that no versions of DOS and only a few versions of Windows support it. There's also a causal loop in that even modern Windows versions don't come with the console code page set to 65001 by default. I don't know what is likely to break this loop, but I doubt that the restrictiveness of one language's standard library is going to do it. There is PoshConsole http://poshconsole.codeplex.com/ It's all .net and WPF, therefore UTF-16, but it's way different architecture and interface. BTW cmd has /u switch for (redirected) unicode output, I use it sometimes. --