New submission from Dāvis:
subprocess uses wrong encoding on Windows.
On Windows 10 with Python 3.5.1
from Command Prompt (cmd.exe)
> chcp 65001
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
File "P:\Python35\lib\subprocess.py", line 629, in check_output
**kwargs).stdout
File "P:\Python35\lib\subprocess.py", line 698, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "P:\Python35\lib\subprocess.py", line 1055, in communicate
stdout = self.stdout.read()
File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2:
character maps to <undefined>
from PowerShell
> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
File "P:\Python35\lib\subprocess.py", line 629, in check_output
**kwargs).stdout
File "P:\Python35\lib\subprocess.py", line 698, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "P:\Python35\lib\subprocess.py", line 1055, in communicate
stdout = self.stdout.read()
File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2:
character maps to <undefined>
As you can see even if consoles encoding is UTF-8 it still uses Windows ANSI
codepage 1257
this happens because io.TextIOWrapper is used with default encoding which is
locale.getpreferredencoding(False)
but that's wrong because that's not console's encoding.
I've attached a patch which fixes this by using correct console encoding with
sys.stdout.encoding
Only note that there's different bug that when python is executed inside
PowerShell's group expression then sys.stdout.encoding will be wrong
> [Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
> ([Console]::OutputEncoding.EncodingName)
Unicode (UTF-8)
> python -c "import sys; print(sys.stdout.encoding)"
cp65001
> (python -c "import sys; print(sys.stdout.encoding)")
cp1257
it still should be cp65001 and that's why in this case subprocess will still
fail even with my patch, but this some different bug.
----------
components: IO, Library (Lib), Unicode, Windows
files: subprocess_fix_encoding.patch
keywords: patch
messages: 266852
nosy: davispuh, ezio.melotti, haypo, paul.moore, steve.dower, tim.golden,
zach.ware
priority: normal
severity: normal
status: open
title: subprocess uses wrong encoding on Windows
type: behavior
versions: Python 3.5, Python 3.6
Added file: http://bugs.python.org/file43094/subprocess_fix_encoding.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27179>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com