On approximately 3/24/2009 10:16 AM, came the following characters from the keyboard of INADA Naoki:
Hi. I'm Japanese and non-ascii charactor user. (cp932)

We have to use "IME" to input non-ascii charactor in Windows.
When "> chcp 65001" in cmd.exe, we cannot use IME on cmd.exe.

So setting codepage to 65001 make output universal but make input ascii-only.
Sit!!!

I hope PyQtShell <http://code.google.com/p/pyqtshell/> become good
IDLE alternative.


Thanks for the feedback.

So at least one version of the code I posted shows that programmatically, the code page can be set differently for input and output, although the last version brought both to 65001. It seems that the chcp 65001 always does both. If the IME only works for cp932, then leave input at cp932, and set output to 65001?

I have no idea if that could be a solution for you, but I would be interested in your results if you find that it is, or isn't, as that would add to the collective knowledge base about the subject. This is idea 2, below, where I tried to cover the solution space more broadly.

Looking briefly at the definition of cp932, it seems that it covers most of the Unicode characters... so perhaps any or several of the following could happen:

1) the IME could be converted to produce UTF-8 instead of cp932, allowing use of 65001 for input and output 2) the split code page could be used to avoid the conversion of Unicode to cp932 for output. 3) Unicode could be converted to cp932 for output, allowing use of cp932 for both input and output.

These are listed in the order of increased overhead for character handling.

Perhaps you could enlighten us all as to the issues with each of these ideas.

I realize the IME exists today, and is likely coded to use cp932, and that it would take some work to convert it to produce Unicode. However, there seems to be a straightforward conversion chart between cp932 and Unicode at Wikipedia, so perhaps that isn't a huge effort.

It seems that the long term goal of having all software speak Unicode would increase the efficiency of all software when dealing with multi-lingual issues, as a common solution can be applied universally, rather than re-inventing solutions that only work for particular code pages.

But I'm not fully aware of whether or not the design or implementation of Unicode precludes universal solutions: I have heard rumors that certain characters must be interpreted differently in different locale contexts, which seems to be counter to the "one solution fits all" possibility.

--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to