[issue28246] Unable to read simple text file

2016-09-22 Thread Eryk Sun
Eryk Sun added the comment: Codepage 1251 is a single-byte encoding and a superset of ASCII (i.e. ordinals 0-127). UTF-8 is also a superset of ASCII, so there's no problem as long as the encoded text is strictly ASCII. But decoding non-ASCII UTF-8 as codepage 1251 produces nonsense, otherwise

[issue28246] Unable to read simple text file

2016-09-22 Thread AndreyTomsk
AndreyTomsk added the comment: Thanks for quick reply. I'm new to python, just used tutorial docs and didn't read carefully enough to notice encoding info. Still, IMHO behaviour not consistent. In three sequential symbols in russian alphabet - З, И, К, it crashes on И, and displays other in

[issue28246] Unable to read simple text file

2016-09-22 Thread SilentGhost
SilentGhost added the comment: It would be good to add a FAQ / HowTo entry for this question. -- nosy: +SilentGhost ___ Python tracker ___

[issue28246] Unable to read simple text file

2016-09-22 Thread Eryk Sun
Eryk Sun added the comment: The default encoding on your system is Windows codepage 1251. However, your file is encoded using UTF-8: >>> lines = open('ResourceStrings.rc', 'rb').read().splitlines() >>> print(*lines, sep='\n') b'\xef\xbb\xbf\xd0\x90 (cyrillic A)' b'\xd0\x98

[issue28246] Unable to read simple text file

2016-09-22 Thread AndreyTomsk
New submission from AndreyTomsk: File read operation fails when gets specific cyrillic symbol. Tested with script: testFile = open('ResourceStrings.rc', 'r') for line in testFile: print(line) Exception message: Traceback (most recent call last): File "min_test.py", line 6, in for