[issue15809] IDLE console uses incorrect encoding.

2014-05-14 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I'm going to commit my patch in few days. This is not perfect solution, but I 
believe it is better than current state.

--
nosy: +benjamin.peterson

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2014-03-03 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Terry, what do you think? This bug is critical for non-ASCII-only users of IDLE.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-11-17 Thread irdb

irdb added the comment:

Well, if there is no other way around this, I think it's better to apply 
Martin's patch. At least then we will be able to enter any valid utf-8 
character in IDLE (although print statement won't print correctly unless the 
string is decoded first).

(As a Windows user, currently I can't print u'йцук' in interactive mode and get 
an Unsupported characters in input error because my default system encoding 
(cp1256) can't encode Russian.)

Also it will be more Unicode friendly which is not a bad thing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-11-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch with a hack which corrects a line number in displayed 
traceback. It doesn't solve a problem totally, but perhaps it is a good enough 
approximation.

--
Added file: http://bugs.python.org/file32671/idle_compile_coding_3.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-11-17 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-11-17 Thread irdb

irdb added the comment:

Thank you Serhiy for working on this. This patch solves the problem when all 
your input characters are encodable using system preferred encoding. But 
issue19625 persists. I still can't print something like 'Русский текст' in 
interactive mode.

Another problem is that output of interactive mode and running a module is 
different if source encoding is something other that system's default. For 
example:

(With current patch) in interactive mode:

 print 'آ'
آ
 print u'آ'
آ

But in when running same commands from a file with utf-8 encoding:

ط¢
آ

I know, you siad to use cp1256 encoding in my source file. But I really prefer 
to work with utf-8 than working with a codepage that is not fully compatible 
with my language and I believe many other programmers are the same as me (even 
if there is a codepage they can work with).

(cp1256 is only for Arabic, but (when a program does not support unicode) 
Microsoft uses it as a second option for some other similar languages like 
Urdu, Persian. But it does not include all the characters they need.)

IMO these two problems -- being a able to type any Unicode characters in 
interactive mode and getting the same output as running modules -- are more 
important than a mere representation problem in interactive mode. (and 
considering that I use utf-8 for my source files, both of them were solved by 
martin's patch. [Of course I'm not saying it's the solution, just that it 
worked better for me])

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-11-01 Thread Geraldo Xexeo

Geraldo Xexeo added the comment:

The same program will behave different in Windows and Mac.

utf-8 works on Mac (10.6.8), cp1256 does not print some lines

cp1256 works on Windows 7, utf-8 prints some characters in a wrong way

For the record, I use accentuated letters from Portuguese alphabet ( atilde; 
ã, for example).

--
nosy: +Geraldo.Xexeo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Use cp1256 encoding in your source file. It is expected that usually your 
source files encoding is same as your locale encoding. In such case printing 
string literals and Unicode string literals produces same result (as they look 
in the sources).

s1 is '\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85' (u'سلام'.encode('utf-8')) and when it 
printed in utf-8 locale it produces سلام, but when it printed in cp1256 
locale it produces a mojibake.

When you convert your source file to cp1256 and change a header, s1 will be 
'\xd3\xe1\xc7\xe3' (u'سلام'.encode('cp1256')) and will produce سلام when 
printed in your cp1256 locale.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-21 Thread irdb

irdb added the comment:

Sorry, it's probably has the same problem as what Martin suggested. Although I 
think it's better than having program interrupted.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-21 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 How about passing on UnicodeError?

I don't see how it will resolve any problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-21 Thread irdb

irdb added the comment:

Oops! My problem is that I get Unsupported characters in input when trying to 
type any Unicode string in IDLE's interactive mode.

e.g. a simple command like: s = u'Русский текст' gives: Unsupported characters 
in input

Probably unrelated to the issue at hand. I don't know how I ended up in this 
thread. Again, sorry.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-21 Thread irdb

irdb added the comment:

I really think this information might help, if not, I promise not to post 
anything else. :)

This is a sample program I run:

'''
# -*- coding: utf-8 -*-
import sys
import locale

de = sys.getdefaultencoding()
pd = locale.getpreferredencoding()
print de, pd

s1 = 'سلام'
print s1
s2 = u'سلام'
print s2
'''

I tried to run it before and after applying suggested patches.
Before applying any patch:

'''
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on 
win32
Type copyright, credits or license() for more information.
  RESTART 
 
ascii cp1256
ط³ظ„ط§ظ…
سلام
 s3 = 'سلام'
 s4 = u'سلام'
 s3
'\xd3\xe1\xc7\xe3'
 s4
u'\xd3\xe1\xc7\xe3'
 print s3
سلام
 print s4
ÓáÇã
 s = u'Русский текст'
Unsupported characters in input
'''

After applying loewis's patch:

'''
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on 
win32
Type copyright, credits or license() for more information.
  RESTART 
 
ascii cp1256
ط³ظ„ط§ظ…
سلام
 s3 = 'سلام'
 s4 = u'سلام'
 s3
'\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'
 s4
u'\u0633\u0644\u0627\u0645'
 print s3
ط³ظ„ط§ظ…
 print s4
سلام
 s = u'Русский текст'
 print s
Русский текст
 
'''

After applying serhiy.storchaka's patch:

'''
Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit (AMD64)] on 
win32
Type copyright, credits or license() for more information.
  RESTART 
 
ascii cp1256
ط³ظ„ط§ظ…
سلام
 s3 = 'سلام'
 s4 = u'سلام'
 s3
'\xd3\xe1\xc7\xe3'
 s4
u'\u0633\u0644\u0627\u0645'
 print s3
سلام
 print s4
سلام
 s = u'Русский текст'
Unsupported characters in input
'''

My point is that printing s3 and s4 in interactive mode, should produce the 
same results as printing s1 and s2 from source file. Loewis's patch handled 
this as I expected. Also this patch solves my problem of not being able to 
print u'Русский текст' (that is due to my Windows locale being set to Persian, 
not Russian.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-08-19 Thread ali

Changes by ali mr.da...@gmail.com:


--
nosy: +irdb

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-05-05 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Good catch. Thank you, Roger.

--
stage: patch review - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-05-04 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

If there are no objections I will commit the patch soon.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-05-04 Thread Roger Serwy

Roger Serwy added the comment:

There is a problem. Adding the encoding comment to the top of the source causes 
off-by-one line errors in the traceback.

Take as an example:

 1/0

Traceback (most recent call last):
  File pyshell#0, line 2, in module
ZeroDivisionError: integer division or modulo by zero


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-23 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 Here's a tangentially related issue: #14326

Yes, this issue looks as a prerequisite for it.

 IDLE doesn't handle pasting multi-line code properly (issue3559), IDLE2 will 
 silently ignore code after the first executable statement. IDLE3 may give an 
 error.

Well, then the patch doesn't introduce a significant regression.

 Can't we just make IDLE's shell default to UTF-8?

This is easy. Just set IOBinding.encoding to utf-8. But I'm not sure we don't 
lost something in this case.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

The problem (as I understand it) is that if Martin's patch fixes an unicode 
literals, it breaks a string literals.

LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python Lib/idlelib/idle.py

 print u'йцук'
йцук
 print 'йцук'
йцук

Here is a different patch, which fixes unicode strings and preserve byte 
strings.

 print u'йцук'
йцук
 print 'йцук'
йцук

--
nosy: +kbk, roger.serwy
Added file: http://bugs.python.org/file29974/idle_compile_coding_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-22 Thread R. David Murray

R. David Murray added the comment:

I've combined the nosy lists and will close the other issue.  Serhiy, what if 
the source already has a coding cookie line?

--
nosy: +Pradyun.Gedam, Tomoki.Imai, ned.deily, r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

A coding cookie line will be ignored. Note that this affects only very obscure 
case when you are pasting a multiline code with a coding cookie line into IDLE 
shell (this is only way to get a coding cookie line in the shell). Running a 
file does not pass through runsource(). runsource() used only for a code 
entered in the shell (may be Martin or Roger correct me if I wrong)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-22 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

An interactive Python console ignores a coding cookie line too.

$ cat test.py
# -*- coding: koi8-r -*-
print repr('йцук'), 'йцук', repr(u'йцук'), u'йцук'
$ LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python test.py
'\xe9\xf6\xf3\xea' йцук u'\u0418\u0416\u0421\u0419' ИЖСЙ
$ LC_ALL=ru_RU.cp1251 LANG=ru_RU.cp1251 ./python
Python 2.7.4+ (2.7:0f31f38e8a17+, Apr 13 2013, 21:06:36) 
[GCC 4.4.3] on linux2
Type help, copyright, credits or license for more information.
 # -*- coding: koi8-r -*-
... print repr('йцук'), 'йцук', repr(u'йцук'), u'йцук'
'\xe9\xf6\xf3\xea' йцук u'\u0439\u0446\u0443\u043a' йцук

('\xe9\xf6\xf3\xea' is 'йцук' in cp1251 and 'ИЖСЙ' in koi8-r)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-04-22 Thread Roger Serwy

Roger Serwy added the comment:

Here's a tangentially related issue: #14326

IDLE doesn't handle pasting multi-line code properly (issue3559), IDLE2 will 
silently ignore code after the first executable statement. IDLE3 may give an 
error.

Can't we just make IDLE's shell default to UTF-8?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2013-01-27 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-11-12 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

 However, this patch isn't right, since it will cause all source to be 
 interpreted as UTF-8. This would be wrong when the sys.stdin.encoding is not 
 UTF-8, and byte string objects are created in interactive mode.

Can you show how to reproduce the error that you're talking about?  I have 
found no issues running the bare Python and IDLE (with your patch, of course) 
with files in different encodings under different locales.

--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-11-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
stage:  - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-08-31 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-08-29 Thread Andrew Svetlov

Changes by Andrew Svetlov andrew.svet...@gmail.com:


--
title: Строки из IDLE поступают в неверной кодировке. - IDLE console uses 
incorrect encoding.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-08-29 Thread alex hartwig

alex hartwig added the comment:

Text is correct. See attachment.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue15809] IDLE console uses incorrect encoding.

2012-08-29 Thread Martin v . Löwis

Martin v. Löwis added the comment:

The problem is that IDLE passes an UTF-8 encoded source string to compile, and 
compile, in the absence of a source encoding, uses the PEP 263 default source 
encoding, i.e. Latin-1.

As the consequence, the variable s has the value

u'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9 
\\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd1\\x81\\xd1\\x82'

IDLE's Default Source Encoding is irrelevant - it only applies to editor 
windows.

One solution for that is the attached patch. However, this patch isn't right, 
since it will cause all source to be interpreted as UTF-8. This would be wrong 
when the sys.stdin.encoding is not UTF-8, and byte string objects are created 
in interactive mode.

Interactive mode manages to get it right by looking up sys.stdin.encoding 
during compilation, but it does so only when in interactive mode (i.e. when 
tok-prompt != NULL.

I don't see any way to fix this problem in Python 2. It is fixed in Python 3, 
basically by always assuming that the source encoding is UTF-8, by making all 
string objects Unicode objects, and disallowing non-ASCII characters in bytes 
literals

--
keywords: +patch
nosy: +loewis
Added file: http://bugs.python.org/file27045/compile_unicode.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue15809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com