Re: how to localize console and GUI apps in Windows

2018-01-04 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 18:13:04 UTC, H. S. Teoh wrote:
If the problem is in readln(), then you probably need to read 
the input in binary (i.e., as ubyte[]) and convert it manually.


Could you kindly explain how I can read console input into binary 
ubyte[]?






Re: how to localize console and GUI apps in Windows

2018-01-04 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 18:13:04 UTC, H. S. Teoh wrote:
On Fri, Dec 29, 2017 at 10:35:53AM +, Andrei via 
Digitalmars-d-learn wrote:
This may be endurable if you write an application where 
Russian is only one of rare options, and what if your whole 
environment is totally Russian?


You mean if your environment uses a non-UTF encoding?  If your 
environment uses UTF, there is no problem.  I have code with 
strings in Russian (and other languages) embedded, and it's no 
problem because everything is in Unicode, all input and all 
output.


No, I mean difficulties to write a program based on non-ASCII 
locales. Every programming language learning since C starts with 
a "hello world" program which every non-English programmer 
essentially tries to translate to native language - and gets 
unreadable mess on the screen. Thousands try, hundreds look for a 
solution, dozens find it, and a few continue with the new 
language. That's not because these programmers cannot read 
English text-books, they can. That's because they want to write 
non-English programs for non-English people, and that's 
essential. And there are many programming languages (or rather 
their runtimes) which do not suffer such a deficiency.


That's the reason for UNICODE adoption all over the programming 
world - including D language, but what's the good for me if I can 
write in a D program a UTF8 string with my native language text, 
and get the same unreadable mess on the screen?


Yes, a new language in development can lack support for some 
features, but this forum branch shows that a simple and handy 
solution exists - yet nobody cares to bring it to the first pages 
of every text-book for beginners, at least as a footnote. Thus 
thousands of potential new language fans are lost from start.


But I understand that in Windows you may not have this luxury. 
So you have to deal with codepages and what-not.


Converting back and forth is not a big problem, and it actually 
also solves the problem of string comparisons, because std.uni 
provides utilities for collating strings, etc.. But it only 
works for Unicode, so you have to convert to Unicode internally 
anyway.  Also, for static strings, it's not hard to make the 
codepage mapping functions CTFE-able, so you can actually write 
string literals in a codepage and have the compiler 
automatically convert it to UTF-8.


The other approach, if you don't like the idea of converting 
codepages all the time, is to explicitly work in ubyte[] for 
all strings. Or, preferably, create your own string type with 
ubyte[] representation underneath, and implement your own 
comparison functions, etc., then use this type for all strings. 
Better yet, contribute this to code.dlang.org so that others 
who have the same problem can reuse your code instead of 
needing to write their own.


I'd definitely try this if I decide to use D language for my 
purposes (which not settled yet). But to decide I need some 
experience, and for now it stopped at reading the user's input 
(for training I intend to translate into D my recent rather 
complex interactive C# program).


Still this does not decide localized input problem: any 
localized input throws an exception “std.utf.UTFException...  
Invalid UTF-8 sequence”.


Is the exception thrown in readln() or in writeln()? If it's in
writeln(), it shouldn't be a big deal, you just have to pass 
the data returned by readln() to fromKOI8 (or whatever other 
codepage you're using).


If the problem is in readln(), then you probably need to read 
the input in binary (i.e., as ubyte[]) and convert it manually. 
Unfortunately, there's no other way around this if you're 
forced to use codepages. The ideal situation is if you can just 
use Unicode throughout your environment. But of course, 
sometimes you have no choice.


It depends.

If I avoid proper console code page initializing, I see in 
debugger that runtime reads the user's input as CP866 (MS DOS) 
Cyrillic and then throws the exception "Invalid UTF-8 sequence" 
when trying to handle it as UTF8 string (in particular by strip() 
or writeln() functions). This situation seems quite manageable by 
code page conversions you've mentioned above. I've tried first 
library function found (std.windows.charset), and got a rather 
fanciful working statement:


response = fromMBSz((readln()~"\0").ptr, 1).strip();

which assigns correct Latin/Cyrillic contents to the response 
variable.


And if I initialize console with SetConsoleCP(65001) statement 
things get worse, as I've said above. Then readln() statement 
returns an empty string and something gets broken inside the 
runtime, because any further readln() statements do not wait for 
user input, and return empty strings immediately.







Re: how to localize console and GUI apps in Windows

2018-01-03 Thread Andrei via Digitalmars-d-learn

On Wednesday, 3 January 2018 at 09:11:32 UTC, thedeemon wrote:
Windows API contains two sets of functions: those whose names 
end with A (meaning ANSI), the other where names end with W 
(wide characters, meaning Unicode). The sample uses TextOutA, 
this function that expects 8-bit encoding.


Gosh, I should new this :)) Thanks for the point! TextOutW() 
works fine with wstring texts in this example and no more changes 
needed.


That's just enough for this example. Thank you!

Yet my particular interest is console interconnections. With the 
help of this forum I've learned console settings to write 
Cyrillic properly and simply to the console using UTF8 encoding.


One thing that remains is to read and process the user's input.

For now in the example I've cited above response=readln(); 
statement returns an empty string, in a console set for UTF8 code 
page, if the user's input contains any Cyrillic letters. Then the 
program's behavior differs depending on the compiler (or more 
likely on the runtime library): the one compiled with ldc 
continues to read on and returns empty lines, instead of the 
user's input, and the one compiled with dmd only returns empty 
lines not waiting for the user's input and not actually reading 
anything (i.e. it falls into indefinite loop busily printing 
empty response hundreds times a second).


That's only for localized input. With ASCII input same program 
works fine.


May be there is some more settings I must learn to set console to 
properly read non-ASCII input?




Re: how to localize console and GUI apps in Windows

2018-01-02 Thread Andrei via Digitalmars-d-learn

On Friday, 29 December 2017 at 11:14:39 UTC, zabruk70 wrote:

On Friday, 29 December 2017 at 10:35:53 UTC, Andrei wrote:
Though it is not suitable for GUI type of a Windows 
application.


AFAIK, Windows GUI have no ANSI/OEM problem.
You can use Unicode.


Partly, yes. Just for a test I tried to "russify" the example 
Windows GUI program that comes with D installation pack 
(samples\d\winsamp.d). Window captions, button captions, message 
box texts written in UTF8 all shows fine. But direct text output 
functions CreateFont()/TextOut() render all Cyrillic from UTF8 
strings into garbage.



For Windows ANSI/OEM problem you can use also
https://dlang.org/phobos/std_windows_charset.html


Thank you very much, toMBSz() makes requisite translation for  
TextOut() function with some workarounds.






Re: how to localize console and GUI apps in Windows

2017-12-29 Thread Andrei via Digitalmars-d-learn

On Thursday, 28 December 2017 at 18:45:39 UTC, H. S. Teoh wrote:
On Thu, Dec 28, 2017 at 05:56:32PM +, Andrei via 
Digitalmars-d-learn wrote:

...
The string / wstring / dstring types in D are intended to be 
Unicode strings.  If you need to use other encodings, you 
really should be using ubyte[] or const(ubyte)[] or 
immutable(ubyte)[], instead of string.


Thank you Teoh for advise and good example! I was looking towards 
writing something like that if no decision exists. Still this way 
of deliberate translations seems to be not the best. It supposes 
explicit workaround for every ahchoo in Russian and steady 
converting ubyte[] to string and back around. No formatting gems, 
no simple and elegant I/O statements or string/char comparisons. 
This may be endurable if you write an application where Russian 
is only one of rare options, and what if your whole environment 
is totally Russian?


Or some other nonASCII locale... Many other cultures have same 
mix of DOS/Window/Unix code pages. The decision to use only 
Unicode for strings in D language seems excellent just because of 
this, but the realization turns out to be delusive. Folks in such 
countries won’t appreciate a language which is elegant only for 
English-spoken intercommunications.


This problem is common for most programming languages and 
runtimes I know of. The only system which has decided the whole 
case is .NET I think.


The way proposed by zabruk70 below seems more appropriate though 
more particular too - I feel it suits only console type of 
applications. Alas, this type of application proved to be buggy 
too.


On Thursday, 28 December 2017 at 22:49:30 UTC, zabruk70 wrote:

you can just set console CP to UTF-8:

https://github.com/CyberShadow/ae/blob/master/sys/console.d


Yes! This seems to be the required, thank you very much! Though 
it is not suitable for GUI type of a Windows application.


Still some testing showed that this way conforms only console 
output. Simple read/write/compare script listed below works very 
well until the user enters something Russian. It then prints 
**empty** response and falls into indefinite loop printing the 
prompt and then immediately empty response without actually 
reading it.


But I think this is subject for ”Issues” part of this forum.

p.s. I’ve found that I may set “Consolas” font for a console 
window and then you can output properly localized UTF8 strings 
without any special code in D script managing code pages. Still 
this does not decide localized input problem: any localized input 
throws an exception “std.utf.UTFException... Invalid UTF-8 
sequence”.


The script:

import core.sys.windows.windows;
import std.stdio;
import std.string;

int main(string[] args)
{
const UTF8CP = 65001;
UINT oldCP, oldOutputCP;
oldCP = GetConsoleCP();
oldOutputCP = GetConsoleOutputCP();

SetConsoleCP(UTF8CP);
SetConsoleOutputCP(UTF8CP);

writeln("hello world, привет всем!");

bool quit = false;
string response;
while (!quit)
{
write("responde something: ");
response=readln().strip();
writefln("your response is \"%s\"", response);
if (response == "quit")
{
writeln("good buy then!");
quit = true;
}
}

SetConsoleCP(oldCP);
SetConsoleOutputCP(oldOutputCP);

return 0;
}



how to localize console and GUI apps in Windows

2017-12-28 Thread Andrei via Digitalmars-d-learn
There is one everlasting problem writing Cyrillic programs in 
Windows: Microsoft consequently invented two much different code 
pages for Russia and other Cyrillic-alphabet countries: first was 
MSDOS-866 (and alike), second Windows-1251. Nowadays MS Windows 
uses first code page for console programs, second for GUI 
applications, and there always are many workarounds to get proper 
translation between them. Mostly a programmer should write 
program sources either in one code page for console and other for 
GUI, or use .NET, which basically uses UTF8 in sources and makes 
seamless translation depending on back end.


In D language which uses only UTF8 for string encoding I cannot 
write neither MS866 code page program texts, nor Windows-1251 - 
both cases end in a compiler error like "Invalid trailing code 
unit" or "Outside Unicode code space". And writing Cyrillic 
strings in UTF8 format is fatal for both console and GUI Windows 
targets.


My question is: is there any standard means to translate Cyrillic 
or any other localized UTF8 strings for console and GUI output in 
D libraries. If so - where I can get more information and good 
example. Google would not help.


Thanks.