New submission from Marko Lalic:
When the message's Content-Transfer-Encoding is set to 8bit, the
get_payload(decode=True) method returns the payload encoded using
raw-unicode-escape. This means that it is impossible to decode the returned
bytes using the content charset obtained by the
Serhiy Storchaka added the comment:
message.get_payload(decode=True).decode('latin1')
'ünicöde data..'
--
nosy: +serhiy.storchaka
versions: +Python 3.4
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18271
Marko Lalic added the comment:
That will work fine as long as the characters are actually latin. We cannot
forget the rest of the unicode character planes. Consider::
message = message_from_string(MIME-Version: 1.0
... Content-Type: text/plain; charset=utf-8
... Content-Disposition: inline
R. David Murray added the comment:
The python3 email package's handling of 8bit definitely has quirks. (So did
the python2 email package's, but they were different quirks. :)
You can't correctly handle 8bit unless you use message_from_bytes and take the
input from a byte string. It is a
Marko Lalic added the comment:
Thank you for your reply.
Unfortunately, I have a use case where message_from_bytes has a pretty great
disadvantage. I have to parse the received message and then forward it
completely unchanged, apart from possibly adding a few new headers. The problem
with
R. David Murray added the comment:
If all you are changing is headers (and you con't change the CTE), then when
you use BytesGenerator to re-serialize the message, it is supposed to preserve
the existing CTE/payload. (Whether or not you call get_payload, regardless of
arguments, does not