[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread Marko Lalic
New submission from Marko Lalic: When the message's Content-Transfer-Encoding is set to 8bit, the get_payload(decode=True) method returns the payload encoded using raw-unicode-escape. This means that it is impossible to decode the returned bytes using the content charset obtained by the

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: message.get_payload(decode=True).decode('latin1') 'ünicöde data..' -- nosy: +serhiy.storchaka versions: +Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18271

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread Marko Lalic
Marko Lalic added the comment: That will work fine as long as the characters are actually latin. We cannot forget the rest of the unicode character planes. Consider:: message = message_from_string(MIME-Version: 1.0 ... Content-Type: text/plain; charset=utf-8 ... Content-Disposition: inline

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread R. David Murray
R. David Murray added the comment: The python3 email package's handling of 8bit definitely has quirks. (So did the python2 email package's, but they were different quirks. :) You can't correctly handle 8bit unless you use message_from_bytes and take the input from a byte string. It is a

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread Marko Lalic
Marko Lalic added the comment: Thank you for your reply. Unfortunately, I have a use case where message_from_bytes has a pretty great disadvantage. I have to parse the received message and then forward it completely unchanged, apart from possibly adding a few new headers. The problem with

[issue18271] get_payload method returns bytes which cannot be decoded using the message's charset

2013-06-20 Thread R. David Murray
R. David Murray added the comment: If all you are changing is headers (and you con't change the CTE), then when you use BytesGenerator to re-serialize the message, it is supposed to preserve the existing CTE/payload. (Whether or not you call get_payload, regardless of arguments, does not