New submission from Grigory Statsenko <grisha...@gmail.com>: (Discovered together with https://bugs.python.org/msg322348)
Email message serialization (in function _fold_as_ew) enters an infinite loop when folding non-ASCII headers whose words (after encoding) are longer than the given maxlen. Besides being stuck in an infinite loop, it keeps appending to the `lines` list, so its memory usage keeps on growing also infinitely. The code keeps appending encoded empty strings to the list like this: lines: [ 'Subject: =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' ' ] (and it keeps on growing) Here is my code that can reproduce this issue (as a unittest): import email.generator import email.policy from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from unittest import TestCase def create_message(subject, sender, recipients, body): msg = MIMEMultipart() msg.set_charset('utf-8') msg.policy = email.policy.SMTP msg.attach(MIMEText(body, 'html')) msg['Subject'] = subject msg['From'] = sender msg['To'] = ';'.join(recipients) return msg class TestEmailMessage(TestCase): def _make_message(self, subject): return create_message( subject=subject, sender='m...@site.com', recipients=['m...@site.com'], body='Some text', ) def test_ascii_message_with_len_limit(self): # very long subject consisting of a single word subject = 'Q' * 100 msg = self._make_message(subject) self.assertTrue(msg.as_string(maxheaderlen=76)) def test_non_ascii_message_with_len_limit(self): # very long subject consisting of a single word subject = 'Ц' * 100 msg = self._make_message(subject) self.assertTrue(msg.as_string(maxheaderlen=76)) The ASCII test passes, but the non-ASCII one never finishes. >From what I can tell, the problem is in line 2728 of >email/_header_value_parser.py: first_part = first_part[:-excess] where `excess` is calculated from the encoded string (which is several times longer than the original one), but it truncates the original (non-encoded string). The problem arises when `excess` is actually greater than `first_part` So, it attempts to encode the exact same part of the header and fails in every iteration, instead appending an empty string to the list and encoding it as ' =?utf-8?q??=' What this amounts to is that it's now practically impossible to send emails with non-ACSII subjects without either disregarding the RFC recommendations and requirements for line length or risking hangs and memory leaks. Just like in https://bugs.python.org/msg322348, this behavior is new in Python 3.6. Also does not work in 3.7 and 3.8 ---------- components: email messages: 322351 nosy: altvod, barry, r.david.murray priority: normal severity: normal status: open title: Email message serialization enters an infinite loop when folding non-ASCII headers with long words versions: Python 3.6, Python 3.7, Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue34222> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com