[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-26 Thread Eryk Sun

Eryk Sun  added the comment:

> If you are lost with locale encodings, you can attempt to encode
> everything in UTF-8 and enables the Python UTF-8 Mode:

rushant is using Python 3.6. UTF-8 mode was added in 3.7, so it's not an option 
without first upgrading to 3.7. Also, it's important to note that the 
suggestion to "attempt to encode everything in UTF-8" includes whatever 
terminal encoding or shell-script file encoding is used for `export a="中文"`. If 
it's not using UTF-8, then setting the preferred encoding in Python to UTF-8 
isn't going to help.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-26 Thread STINNER Victor


STINNER Victor  added the comment:

Oh, I forgot to note that Windows is not affected by this issue, since Windows 
provides directly environment variables as Unicode, and so Python doesn't need 
to decode byte strings to read os.environ['a'] ;-)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-26 Thread STINNER Victor


STINNER Victor  added the comment:

Python works as expected: the UTF-8 codec doesn't allow to encode surrogate 
characters.

Surrogate characters are coming from os.environ['a'] because this environment 
variable contains bytes which cannot be decoded from the 
sys.getfilesystemencoding().

You should fix your system setup, especially the locale encoding. The strings 
stored in the "a" environment variable was not encoded to the Python filesystem 
encoding:
https://docs.python.org/dev/glossary.html#term-filesystem-encoding-and-error-handler

If you are lost with locale encodings, you can attempt to encode everything in 
UTF-8 and enables the Python UTF-8 Mode:
https://docs.python.org/dev/library/os.html#python-utf-8-mode

Good luck with your setup ;-)

Hint: use print(ascii(job_name)) to dump the string content.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-26 Thread Eryk Sun

Eryk Sun  added the comment:

I think this is a locale configuration problem, in which the locale encoding 
doesn't match the terminal encoding. If so, it can be closed as not a bug.

> export a="中文"

In POSIX, the shell reads "中文" from the terminal as bytes encoded in the 
terminal encoding, which could be UTF-8 or some legacy encoding. The value of 
`a` is set directly as this encoded text. There is no intermediate 
decode/encode stage in the shell. For a child process that decodes the value of 
the environment variable, as Python does, the locale's LC_CTYPE encoding should 
be the same or compatible with the terminal encoding.

> job_name = os.environ['a']
> print(job_name)

In POSIX, sys.stdout.errors, as used by print(), will be "surrogateescape" if 
the default LC_CTYPE locale is a legacy locale -- which in 3.6 is the case for 
the "C" locale, since it's usually limited to 7-bit ASCII. "surrogateescape" is 
also the errors handler for decoding bytes os.environb (POSIX) as text 
os.environ. When decoding, "surrogateescape" handles non-ASCII byte values that 
can't be decoded by translating the value into the reserved surrogate range 
U+DC80 - U+DCFF. When encoding, it translates each surrogate code back to the 
original byte value in the range 0x80 - 0xFF. 

Given the above setup, byte sequences in os.environb that can't be decoded with 
the default LC_CTYPE locale encoding will be surrogate escaped in the decoded 
text  The surrogate-escaped values roundtrip back to bytes when printed, 
presumably as the terminal encoding.

> with open('name.txt', 'w', encoding='utf-8')as fw:
>fw.write(job_name)

The default errors handler for open() is "strict" instead of "surrogateescape", 
so the surrogate-escaped values in job_name cause the encoding to fail.

> Your code runs for me on Windows

In Windows, Python uses the wide-character (16-bit wchar_t) environment of the 
process for os.environ, and, in 3.6+, it uses the console session's 
wide-character API for console files such as sys.std* when they aren't 
redirected to a pipe or disk file. Conventionally, wide-character strings 
should be valid UTF-16LE text. So getting "中文" from os.environ and printing it 
should 'just work'. The output will even be displayed correctly if the console 
session uses a font that supports "中文", or if it's a pseudoconsole (conpty) 
session that's attached to a terminal that supports automatic font fallback, 
such as Windows Terminal.

--
components: +IO, Interpreter Core, Library (Lib), Unicode -C API
nosy: +eryksun, ezio.melotti, vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-26 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

3.6 only gets security patches.  You or someone needs to show an unfixed bug in 
master.  Your code runs for me on Windows, whereas you appear to be using *nix. 
 Replacing job_name.encode() should have the same behavior.  Do you see the 
same with job_name="中文" at the top instead?

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43576] python3.6.4 os.environ error when write chinese to file

2021-03-20 Thread rushant

New submission from rushant <953779...@qq.com>:

# -*- coding: utf-8 -*-
import os
job_name = os.environ['a']
print(job_name)
print(isinstance(job_name, str))
print(type(job_name))
with open('name.txt', 'w', encoding='utf-8')as fw:
fw.write(job_name)


i have set environment param by :
export a="中文"
it returns error:
中文
True

Traceback (most recent call last):
  File "aa.py", line 8, in 
fw.write(job_name)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: 
surrogates not allowed

--
components: C API
messages: 389215
nosy: rushant
priority: normal
severity: normal
status: open
title: python3.6.4 os.environ error when write chinese to file
type: behavior
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com