[issue45938] EmailMessage as_bytes

Marc Villain Tue, 30 Nov 2021 07:19:08 -0800

New submission from Marc Villain <[email protected]>:

I am parsing an email with a subject header where the encoding of a unicode 
character happens to be cut in half. When a second encoded unicode character is 
encountered, we get the following error:


> 'utf-8' codec can't encode characters in position 1-2: surrogates not allowed

This error can be reproduced using the following:
>>> from email.message import EmailMessage
>>> msg = EmailMessage()
>>> msg.add_header('subject', '=?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o_a=C3=B1o?=')
>>> print(str(msg))         # This will succeed
>>> print(msg.as_bytes())   # This will fail
>>> print(msg.as_string())  # This will fail

After a bit of investigations, it appears the library is at some poing trying 
to concatenate 'a\udcc3\udcb1o ' and 'cómo'. It then proceeds to try to call 
_ew.encode in email._header_value_parser._fold_as_ew on that. This obviously 
fails as '\udcc3\udcb1o' is not utf-8, whereas 'cómo' is.

More tests:
[OK] '=?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o_a=C3=B1o?='
     > b' subject: =?utf-8?q?a=C3=B1o_c=C3=B3mo?=\n\n'
[OK] '=?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o_cmo?='
     > b' subject: =?unknown-8bit?q?a=C3=B1o?= cmo\n\n'
[OK] '=?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o?= =?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o?='
     > b' subject: =?unknown-8bit?q?a=C3=B1oa=C3=B1o?=\n\n'
[KO] '=?UTF-8?Q?a=C3?= =?UTF-8?Q?=B1o_a=C3=B1o?='
     > 'utf-8' codec can't encode characters in position 1-2: surrogates not 
allowed

Not sure what is the best way to fix that.

----------
components: Library (Lib)
messages: 407379
nosy: marc.villain
priority: normal
severity: normal
status: open
title: EmailMessage as_bytes
type: crash
versions: Python 3.10

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue45938>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue45938] EmailMessage as_bytes

Reply via email to