[issue13693] email.Header.Header incorrect/non-smart on international charset address fields

kxroberto Sun, 01 Jan 2012 09:25:06 -0800

New submission from kxroberto <kxrobe...@users.sourceforge.net>:

the email.* package seems to over-encode international charset address fields - 
resulting even in display errors in the receivers reader - , 
when message header composition is done as recommended in 
http://docs.python.org/library/email.header.html


Python 2.7.2
>>> e=email.Parser.Parser().parsestr(getcliptext())
>>> e['From']
'=?utf-8?q?Martin_v=2E_L=C3=B6wis?= <rep...@bugs.python.org>'
# note the par
>>> email.Header.decode_header(_)
[('Martin v. L\xc3\xb6wis', 'utf-8'), ('<rep...@bugs.python.org>', None)]
# unfortunately there is no comfortable function for this:
>>> u='Martin v. L\xc3\xb6wis'.decode('utf8') + ' <rep...@bugs.python.org>'
>>> u
u'Martin v. L\xf6wis <rep...@bugs.python.org>'
>>> msg=email.Message.Message()
>>> msg['From']=u
>>> msg.as_string()
'From: =?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n'
>>> msg['From']=str(u)
>>> msg.as_string()
'From: 
=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: 
Martin v. L\xf6wis <rep...@bugs.python.org>\n\n'
>>> msg['From']=email.Header.Header(u)
>>> msg.as_string()
'From: 
=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\nFrom: 
Martin v. L\xf6wis <rep...@bugs.python.org>\nFrom: 
=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=\n\n'
>>> 

(BTW: strange is that multiple msg['From']=... _assignments_ end up as multiple 
additions !???   also msg renders 8bit header lines without warning/error or 
auto-encoding, while it does auto on unicode!??)

Whats finally arriving at the receiver is typically like:

From: "=?utf-8?b?TWFydGluIHYuIEzDtndpcyA8cmVwb3J0QGJ1Z3MucHl0aG9uLm9yZz4=?=" 
<rep...@bugs.python.org>

because the servers seem to want the address open, they extract the address and 
_add_ it (duplicating) as ASCII. => error

I have not found any emails in my archives where address header fields are so 
over-encoded like python does. Even in non-address fields mostly only those 
words/groups are encoded which need it.

I assume the sophisticated/high-level looking email.* package doesn't expect 
that the user fiddles things together low-level? with parseaddr, re.search, 
make_header Header.encode , '.join ... Or is it indeed (undocumented) so? IMHO 
it should be auto-smart enough.

Note: there is a old deprecated function mimify.mime_encode_header which seemed 
to try to cautiously auto-encode correct/sparsely (but actually fails too on 
all examples tried).

----------
components: Library (Lib)
messages: 150434
nosy: kxroberto
priority: normal
severity: normal
status: open
title: email.Header.Header incorrect/non-smart on international charset address 
fields
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13693>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13693] email.Header.Header incorrect/non-smart on international charset address fields

Reply via email to