stijn added the comment:
New here, but I think this is the correct issue to get info about this unicode
problem. On the windows console:
> chcp
Active code page: 437
> type utf.txt
Привет
> chcp 65001
Active code page: 65001
> type utf.txt
Привет
> python --version
Python 3.5.0a0
> cat utf.py
f = open('utf.txt')
l = f.readline()
print(l)
print(len(l))
> python utf.py
Привет
�²ÐµÑ‚
�‚
13
> cat utf_explicit.py
import codecs
f = codecs.open('utf.txt', encoding='utf-8', mode='r')
l = f.readline()
print(l)
print(len(l))
> python utf_explicit.py
Привет
ет
7
I partly read through the page but these things are a bit above my head. Could
anyone explain
- how to figure out what codec files returned by open()?
- is there a way to change it globally to utf-8?
- the last case is almost correct: it has the correct number of characters, but
the print() still does something wrong. I got this working by using the stream
patch, but got another example on which is is not correct, see below. Any way
around this?
> type utf2.txt
aαbβcγdδ
> cat utf2.py
import streams
import codecs
streams.enable()
f = codecs.open('utf2.txt', encoding='utf-8', mode='r')
print(f.read(1))
print(f.read(1))
print(f.read(2))
print(f.read(4))
> python utf2.py
a
α
bβc
γdδ
----------
nosy: +stijn
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1602>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com