[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: I'm going to commit my patch in few days. This is not perfect solution, but I believe it is better than current state. -- nosy: +benjamin.peterson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: Terry, what do you think? This bug is critical for non-ASCII-only users of IDLE. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
irdb added the comment: Well, if there is no other way around this, I think it's better to apply Martin's patch. At least then we will be able to enter any valid utf-8 character in IDLE (although print statement won't print correctly unless the string is decoded first). (As a Windows user, currently I can't print u'йцук' in interactive mode and get an Unsupported characters in input error because my default system encoding (cp1256) can't encode Russian.) Also it will be more Unicode friendly which is not a bad thing. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: Here is a patch with a hack which corrects a line number in displayed traceback. It doesn't solve a problem totally, but perhaps it is a good enough approximation. -- Added file: http://bugs.python.org/file32671/idle_compile_coding_3.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: needs patch - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
irdb added the comment: Thank you Serhiy for working on this. This patch solves the problem when all your input characters are encodable using system preferred encoding. But issue19625 persists. I still can't print something like 'Русский текст' in interactive mode. Another problem is that output of interactive mode and running a module is different if source encoding is something other that system's default. For example: (With current patch) in interactive mode: print 'آ' آ print u'آ' آ But in when running same commands from a file with utf-8 encoding: ط¢ آ I know, you siad to use cp1256 encoding in my source file. But I really prefer to work with utf-8 than working with a codepage that is not fully compatible with my language and I believe many other programmers are the same as me (even if there is a codepage they can work with). (cp1256 is only for Arabic, but (when a program does not support unicode) Microsoft uses it as a second option for some other similar languages like Urdu, Persian. But it does not include all the characters they need.) IMO these two problems -- being a able to type any Unicode characters in interactive mode and getting the same output as running modules -- are more important than a mere representation problem in interactive mode. (and considering that I use utf-8 for my source files, both of them were solved by martin's patch. [Of course I'm not saying it's the solution, just that it worked better for me]) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Geraldo Xexeo added the comment: The same program will behave different in Windows and Mac. utf-8 works on Mac (10.6.8), cp1256 does not print some lines cp1256 works on Windows 7, utf-8 prints some characters in a wrong way For the record, I use accentuated letters from Portuguese alphabet ( atilde; ã, for example). -- nosy: +Geraldo.Xexeo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: Use cp1256 encoding in your source file. It is expected that usually your source files encoding is same as your locale encoding. In such case printing string literals and Unicode string literals produces same result (as they look in the sources). s1 is '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85' (u'سلام'.encode('utf-8')) and when it printed in utf-8 locale it produces سلام, but when it printed in cp1256 locale it produces a mojibake. When you convert your source file to cp1256 and change a header, s1 will be '\xd3\xe1\xc7\xe3' (u'سلام'.encode('cp1256')) and will produce سلام when printed in your cp1256 locale. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
irdb added the comment: Sorry, it's probably has the same problem as what Martin suggested. Although I think it's better than having program interrupted. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: How about passing on UnicodeError? I don't see how it will resolve any problem. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
irdb added the comment: Oops! My problem is that I get Unsupported characters in input when trying to type any Unicode string in IDLE's interactive mode. e.g. a simple command like: s = u'Русский текст' gives: Unsupported characters in input Probably unrelated to the issue at hand. I don't know how I ended up in this thread. Again, sorry. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
irdb added the comment: I really think this information might help, if not, I promise not to post anything else. :) This is a sample program I run: ''' # -*- coding: utf-8 -*- import sys import locale de = sys.getdefaultencoding() pd = locale.getpreferredencoding() print de, pd s1 = 'سلام' print s1 s2 = u'سلام' print s2 ''' I tried to run it before and after applying suggested patches. Before applying any patch: ''' Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32 Type copyright, credits or license() for more information. RESTART ascii cp1256 ط³ظ„ط§ظ… سلام s3 = 'سلام' s4 = u'سلام' s3 '\xd3\xe1\xc7\xe3' s4 u'\xd3\xe1\xc7\xe3' print s3 سلام print s4 ÓáÇã s = u'Русский текст' Unsupported characters in input ''' After applying loewis's patch: ''' Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32 Type copyright, credits or license() for more information. RESTART ascii cp1256 ط³ظ„ط§ظ… سلام s3 = 'سلام' s4 = u'سلام' s3 '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85' s4 u'\u0633\u0644\u0627\u0645' print s3 ط³ظ„ط§ظ… print s4 سلام s = u'Русский текст' print s Русский текст ''' After applying serhiy.storchaka's patch: ''' Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on win32 Type copyright, credits or license() for more information. RESTART ascii cp1256 ط³ظ„ط§ظ… سلام s3 = 'سلام' s4 = u'سلام' s3 '\xd3\xe1\xc7\xe3' s4 u'\u0633\u0644\u0627\u0645' print s3 سلام print s4 سلام s = u'Русский текст' Unsupported characters in input ''' My point is that printing s3 and s4 in interactive mode, should produce the same results as printing s1 and s2 from source file. Loewis's patch handled this as I expected. Also this patch solves my problem of not being able to print u'Русский текст' (that is due to my Windows locale being set to Persian, not Russian.) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by ali mr.da...@gmail.com: -- nosy: +irdb ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: Good catch. Thank you, Roger. -- stage: patch review - needs patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: If there are no objections I will commit the patch soon. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Roger Serwy added the comment: There is a problem. Adding the encoding comment to the top of the source causes off-by-one line errors in the traceback. Take as an example: 1/0 Traceback (most recent call last): File pyshell#0, line 2, in module ZeroDivisionError: integer division or modulo by zero -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: Here's a tangentially related issue: #14326 Yes, this issue looks as a prerequisite for it. IDLE doesn't handle pasting multi-line code properly (issue3559), IDLE2 will silently ignore code after the first executable statement. IDLE3 may give an error. Well, then the patch doesn't introduce a significant regression. Can't we just make IDLE's shell default to UTF-8? This is easy. Just set IOBinding.encoding to utf-8. But I'm not sure we don't lost something in this case. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: The problem (as I understand it) is that if Martin's patch fixes an unicode literals, it breaks a string literals. LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python Lib/idlelib/idle.py print u'йцук' йцук print 'йцук' йцук Here is a different patch, which fixes unicode strings and preserve byte strings. print u'йцук' йцук print 'йцук' йцук -- nosy: +kbk, roger.serwy Added file: http://bugs.python.org/file29974/idle_compile_coding_2.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
R. David Murray added the comment: I've combined the nosy lists and will close the other issue. Serhiy, what if the source already has a coding cookie line? -- nosy: +Pradyun.Gedam, Tomoki.Imai, ned.deily, r.david.murray ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: A coding cookie line will be ignored. Note that this affects only very obscure case when you are pasting a multiline code with a coding cookie line into IDLE shell (this is only way to get a coding cookie line in the shell). Running a file does not pass through runsource(). runsource() used only for a code entered in the shell (may be Martin or Roger correct me if I wrong) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: An interactive Python console ignores a coding cookie line too. $ cat test.py # -*- coding: koi8-r -*- print repr('йцук'), 'йцук', repr(u'йцук'), u'йцук' $ LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python test.py '\xe9\xf6\xf3\xea' йцук u'\u0418\u0416\u0421\u0419' ИЖСЙ $ LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python Python 2.7.4+ (2.7:0f31f38e8a17+, Apr 13 2013, 21:06:36) [GCC 4.4.3] on linux2 Type help, copyright, credits or license for more information. # -*- coding: koi8-r -*- ... print repr('йцук'), 'йцук', repr(u'йцук'), u'йцук' '\xe9\xf6\xf3\xea' йцук u'\u0439\u0446\u0443\u043a' йцук ('\xe9\xf6\xf3\xea' is 'йцук' in cp1251 and 'ИЖСЙ' in koi8-r) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Roger Serwy added the comment: Here's a tangentially related issue: #14326 IDLE doesn't handle pasting multi-line code properly (issue3559), IDLE2 will silently ignore code after the first executable statement. IDLE3 may give an error. Can't we just make IDLE's shell default to UTF-8? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by Serhiy Storchaka storch...@gmail.com: -- assignee: - serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Serhiy Storchaka added the comment: However, this patch isn't right, since it will cause all source to be interpreted as UTF-8. This would be wrong when the sys.stdin.encoding is not UTF-8, and byte string objects are created in interactive mode. Can you show how to reproduce the error that you're talking about? I have found no issues running the bare Python and IDLE (with your patch, of course) with files in different encodings under different locales. -- nosy: +serhiy.storchaka ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by Serhiy Storchaka storch...@gmail.com: -- stage: - patch review ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by Terry J. Reedy tjre...@udel.edu: -- nosy: +terry.reedy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Changes by Andrew Svetlov andrew.svet...@gmail.com: -- title: Строки из IDLE поступают в неверной кодировке. - IDLE console uses incorrect encoding. ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
alex hartwig added the comment: Text is correct. See attachment. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15809] IDLE console uses incorrect encoding.
Martin v. Löwis added the comment: The problem is that IDLE passes an UTF-8 encoded source string to compile, and compile, in the absence of a source encoding, uses the PEP 263 default source encoding, i.e. Latin-1. As the consequence, the variable s has the value u'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9 \\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd1\\x81\\xd1\\x82' IDLE's Default Source Encoding is irrelevant - it only applies to editor windows. One solution for that is the attached patch. However, this patch isn't right, since it will cause all source to be interpreted as UTF-8. This would be wrong when the sys.stdin.encoding is not UTF-8, and byte string objects are created in interactive mode. Interactive mode manages to get it right by looking up sys.stdin.encoding during compilation, but it does so only when in interactive mode (i.e. when tok-prompt != NULL. I don't see any way to fix this problem in Python 2. It is fixed in Python 3, basically by always assuming that the source encoding is UTF-8, by making all string objects Unicode objects, and disallowing non-ASCII characters in bytes literals -- keywords: +patch nosy: +loewis Added file: http://bugs.python.org/file27045/compile_unicode.diff ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue15809 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com