New submission from Dieter Maurer <[email protected]>:
In the transscript below, "ms" and "mb" should be equivalent:
>>> from email import message_from_string, message_from_bytes
>>> mt = """\
... Mime-Version: 1.0
... Content-Type: text/plain; charset=UTF-8
... Content-Transfer-Encoding: 8bit
...
... รค
... """
>>> ms = message_from_string(mt)
>>> mb = message_from_bytes(mt.encode("UTF-8"))
But "mb.as_bytes" succeeds while "ms.as_bytes" raises a "UnicodeEncodeError":
>>> mb.as_bytes()
b'Mime-Version: 1.0\nContent-Type: text/plain;
charset=UTF-8\nContent-Transfer-Encoding: 8bit\n\n\xc3\xa4\n'
>>> ms.as_bytes()
Traceback (most recent call last):
...
File "/usr/local/lib/python3.9/email/generator.py", line 155, in _write_lines
self.write(line)
File "/usr/local/lib/python3.9/email/generator.py", line 406, in write
self._fp.write(s.encode('ascii', 'surrogateescape'))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 0:
ordinal not in range(128)
Apparently, the "as_bytes" ignores the "charset" parameter from the
"Content-Type" header (it should use "utf-8", not "ascii" for the encoding).
----------
components: email
messages: 373711
nosy: barry, dmaurer, r.david.murray
priority: normal
severity: normal
status: open
title: "email.message.Message.as_bytes": fails to correctly handle "charset"
type: behavior
versions: Python 3.9
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41307>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com