New submission from Leslie P. Polzer:

http://hg.python.org/cpython/file/3.3/Lib/smtpd.py#l289

as of now decodes incoming bytes as UTF-8.

An SMTP server must not attempt to interpret characters beyond ASCII, however. 
Originally mail servers were not 8-bit clean, meaning they would only guarantee 
the lower 7 bits of each octet to be preserved.
However even then they were not expected to choke on any input because of 
attempts to decode it into a specific extended charset. Whenever a mail server 
does not need to interpret data (like base64-encoded auth information) it is 
simply left alone and passed through.

I am not aware of the reasons that caused the current state, but to correct 
this behavior and make it possible to support the 8BITMIME feature I suggest 
decoding received bytes as latin1, leaving it to the user to reinterpret it as 
UTF-8 or whatever charset they need. Any other simple extended encoding could 
be used for this, but latin1 is the default in asynchat.

The documentation should also mention charset handling. I'll be happy to submit 
a patch for both code and docs.

----------
components: Library (Lib)
messages: 203467
nosy: skypher
priority: normal
severity: normal
status: open
title: smtpd.py should not decode utf-8
type: enhancement
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue19662>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to