[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2020-08-06 Thread Olivier Dony


Olivier Dony  added the comment:

Somehow the message identifiers in the code sample got messed up in previous 
comment, here's the actual code, for what it's worth ;-) 
https://gist.github.com/odony/0323eab303dad2077c1277076ecc3733

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2020-08-06 Thread Olivier Dony


Olivier Dony  added the comment:

Further, under Python 3.8 the issue is not fully solved, as other 
identification headers are still being folded in a non-RFC-conformant manner 
(see OP for RFC references). This was indicated on the original PR by the 
author: https://github.com/python/cpython/pull/13397#issuecomment-493618544

It is less severe of a problem than for Message-ID, but still means that 
MTA/MUA may fail to recognize the threading structure because identifiers are 
lost.

Is it better to open a new issue for this?


# Example on 3.8.2: the `In-Reply-To` header is RFC2047-folded.

Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import email.message
>>> import email.policy
>>> msg = email.message.EmailMessage(policy=email.policy.SMTP)
>>> msg['Message-Id'] = 
>>> '<929227342217024.1596730490.324691772460938-example-30661-some.refere...@test-123.example.com>'
>>> msg['In-Reply-To'] = 
>>> '<92922734221723.1596730568.324691772460444-another-30661-parent.refere...@test-123.example.com>'
>>> print(msg.as_string())
Message-Id: 
<929227342217024.1596730490.324691772460938-example-30661-some.refere...@test-123.example.com>
In-Reply-To: =?utf-8?q?=3C92922734221723=2E1596730568=2E324691772460444-anot?=
 =?utf-8?q?her-30661-parent=2Ereference=40test-123=2Eexample=2Ecom=3E?=

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2020-08-06 Thread Olivier Dony


Olivier Dony  added the comment:

With regard to msg349895, is there any chance this fix could be considered for 
backport?

I imagine you could view it as a new feature, but it seems to be the only 
official fix we have for the fact that Python 3 generates invalid SMTP 
messages. And that's not a minor problem because many popular MTAs (GMail, 
Outlook, etc.) will rewrite non-RFC-conformant Message IDs, causing the 
original ID to be lost and missing in subsequent replies. This breaks an 
important mechanism to support email threads.

To this day, several Linux distributions still ship 3.6 or 3.7, even in their 
latest LTS, and users and vendors are stuck with supporting those for a while.

Thanks!

--
nosy: +odo2

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-12-08 Thread Abhilash Raj


Abhilash Raj  added the comment:

Closing this since it has been fixed in Python 3.8.

--
resolution:  -> fixed
stage: needs patch -> resolved
status: open -> closed
versions:  -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-08-16 Thread Abhilash Raj


Abhilash Raj  added the comment:

I am slightly confused if this should be backported to bugfix branches since 
this is technically a new feature, the ability to parse Message-ID field.

I would love to see what David and Barry think about this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-06-04 Thread Barry A. Warsaw


Barry A. Warsaw  added the comment:


New changeset 46d88a113142b26c01c95c93846a89318ba87ffc by Barry Warsaw 
(Abhilash Raj) in branch 'master':
bpo-35805: Add parser for Message-ID email header. (GH-13397)
https://github.com/python/cpython/commit/46d88a113142b26c01c95c93846a89318ba87ffc


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-05-22 Thread Abhilash Raj


Abhilash Raj  added the comment:

I have made the requested changes on PR.

David, can you please review again?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-05-17 Thread Abhilash Raj


Abhilash Raj  added the comment:

I have created https://github.com/python/cpython/pull/13397 for this. For now, 
it only parses Message-ID header. 

I do plan to add support for other Identification headers soon, perhaps in a 
2nd PR.

--
nosy: +maxking
stage: patch review -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-05-17 Thread Abhilash Raj


Change by Abhilash Raj :


--
keywords: +patch
pull_requests: +13307
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-01-22 Thread R. David Murray


R. David Murray  added the comment:

Yes, the correct solution would be to write an actual parser for headers 
containing message ids.  All the pieces needed to do this already exist in 
_header_value_parser, it "just" needs a function that glues them together in 
the right order, and then apply that new top-level parser to the appropriate 
headers via headerregistry.

See also issue 34881.

--
stage:  -> needs patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35805] email package folds msg-id identifiers using RFC2047 encoded words where it must not

2019-01-22 Thread Martijn Pieters


New submission from Martijn Pieters :

When encountering identifier headers such as Message-ID containing a msg-id 
token longer than 77 characters (including the <...> angle brackets), the email 
package folds that header using RFC 2047 encoded words, e.g.

Message-ID: 
<154810422972.4.16142961424846318...@aaf39fce-569e-473a-9453-6862595bd8da.prvt.dyno.rt.heroku.com>

becomes

Message-ID: =?utf-8?q?=3C154810422972=2E4=2E16142961424846318784=40aaf39fce-?=
 =?utf-8?q?569e-473a-9453-6862595bd8da=2Eprvt=2Edyno=2Ert=2Eheroku=2Ecom=3E?=

The msg-id token here is this long because Heroku Dyno machines use a UUID in 
the FQDN, but Heroku is hardly the only source of such long msg-id tokens. 
Microsoft's Outlook.com / Office365 email servers balk at the RFC2047 encoded 
word use here and attempt to wrap the email in a TNEF winmail.dat attachment, 
then may fail at this under some conditions that I haven't quite worked out yet 
and deliver an error message to the recipient with the helpful message "554 
5.6.0 Corrupt message content", or just deliver the ever unhelpful winmail.dat 
attachment to the unsuspecting recipient (I'm only noting these symptom here 
for future searches).

I encountered this issue with long Message-ID values generated by 
email.util.make_msgid(), but this applies to all RFC 5322 section 3.6.4 
Identification Fields headers, as well as the corresponding headers from RFC 
822 section 4.6 (covered by section 4.5.4 in 5322).

What is happening here is that the email._header_value_parser module has no 
handling for the msg-id tokens *at all*, and email.headerregistry has no 
dedicated header class for identifier headers. So these headers are parsed as 
unstructured, and folded at will.

RFC2047 section 5 on the other hand states that the msg-id token is strictly 
off-limits, and no RFC2047 encoding should be used to encode such elements. 
Because headers *can* exceed 78 characters (RFC 5322 section 2.1.1 states that 
"Each line of characters MUST be no more than 998 characters, and SHOULD be no 
more than 78 characters[.]") I think that RFC5322 msg-id tokens should simply 
not be folded, at all. The obsoleted RFC822 syntax for msg-id makes them equal 
to the addr-spec token, where the local-part (before the @) contains word 
tokens; those would be fair game but then at least apply the RFC2047 encoded 
word replacement only to those word tokens.

For now, I worked around the issue by using a custom policy that uses 998 as 
the maximum line length for identifier headers:

from email.policy import EmailPolicy

# Headers that contain msg-id values, RFC5322
MSG_ID_HEADERS = {'message-id', 'in-reply-to', 'references', 'resent-msg-id'}

class MsgIdExcemptPolicy(EmailPolicy):
def _fold(self, name, value, *args, **kwargs):
if name.lower() in MSG_ID_HEADERS and self.max_line_length - len(name) 
- 2 < len(value):
# RFC 5322, section 2.1.1: "Each line of characters MUST be no
# more than 998 characters, and SHOULD be no more than 78
# characters, excluding the CRLF.". To avoid msg-id tokens from 
being folded
# by means of RFC2047, fold identifier lines to the max length 
instead.
return self.clone(max_line_length=998)._fold(name, value, *args, 
**kwargs)
return super()._fold(name, value, *args, **kwargs)

This ignores the fact that In-Reply-To and References contain foldable 
whitespace in between each msg-id, but it at least let us send email through 
smtp.office365.com again without confusing recipients.

--
components: email
messages: 334210
nosy: barry, mjpieters, r.david.murray
priority: normal
severity: normal
status: open
title: email package folds msg-id identifiers using RFC2047 encoded words where 
it must not
versions: Python 3.7, Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com