[issue25545] email parsing docs: clarify that only ASCII strings are supported

Jason R. Coombs Wed, 05 Dec 2018 12:37:35 -0800


Jason R. Coombs <[email protected]> added the comment:


I don't think this ticket should be implemented as described.

Consider the use-case in importlib_metadata, which loads metadata from a 
package, metadata known to be of a specified encoding. It already knows the 
encoding and has decoded the full message to text and now wants to parse it. It 
seems very much in the remit of something like email.parser to parse 
already-decoded content.

Yes, the RFCs describe how to decode bytes content, but that shouldn't preclude 
the e-mail module from supporting parsing from Unicode text.

And in fact, it does seem that the library is able to parse non-ascii Unicode 
text, especially on Python 3. Consider 'parse-text.py', attached. It 
illustrates that the parser currently mostly meets my expectation - on Python 
2.7 and 3.7, e-mail messages are parsed from unicode text without any 
indication of an encoding, and returning unicode text on both Python 2 and 
Python 3.

Python 2 is deficient in that message_from_string will get a UnicodeEncodeError 
constructing a bytes-oriented StringIO from the input, which is easily 
worked-around by using the text-oriented io.StringIO.

Still, I would argue the current behavior is desirable and shouldn't be 
deprecated.

----------
nosy: +barry, jason.coombs
Added file: https://bugs.python.org/file47978/parse-text.py

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue25545>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25545] email parsing docs: clarify that only ASCII strings are supported

Reply via email to