[issue46392] MessageIDHeader is too strict for message-id

2022-01-18 Thread bpoaugust


bpoaugust  added the comment:

Sorry, I think '' is not valid, as spaces are not allowed between 
words.

However I am not seeing the original unfolded source if there is an error, 
unless I am misunderstanding the API.

For example:

--- cut here ---
import email.header
import email.utils
import email.policy

def test(test):
msg_string = f"Message-id: {test}"
message = email.message_from_string(msg_string, policy=email.policy.default)
out = message['Message-id']
print(test)
print(out)

test('') # invalid
test('') # valid
--- cut here ---

This produces:


 # truncated at error



i.e. the invalid input is truncated

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-18 Thread R. David Murray


R. David Murray  added the comment:

The general idea is that the string version of the header should contain all of 
the original information, but the parsed elements (the things returned by 
special header attributes) will contain the valid data, if any.  So if the 
string version of the header is being truncated or transformed (other than 
whitespace changes during re-folding), that is a bug.

Your examples involve comment fields, and I'm afraid that my development of the 
parser stopped before I did very much with comments.  Therefore I am not 
surprised that comments are handled incorrectly :( :(  They aren't very common 
in the wild, as far as I was able to tell. which is why they were my last 
priority.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-18 Thread bpoaugust


bpoaugust  added the comment:

I think an id of the form



should be allowed, but it generates

 obs-id-left => local-part => obs-local-part => word *("." word)
word => atom => [CFWS] 1*atext [CFWS]

'' should also be allowed but generates ' (A A)'
and '' gives ' '

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-18 Thread bpoaugust


bpoaugust  added the comment:

When the library is being used to parse existing emails, I think it needs to do 
the minimum validation and canonicalisation.

It may be useful in some circumstances to report where the input is not 
syntactically correct, but I'm not sure it is helpful to truncate the input at 
the first syntax error.

When the library is used to generate emails, validation should be very strict.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-17 Thread R. David Murray


R. David Murray  added the comment:

Note that the parser does attempt to accept obsolete syntax (registering 
defects for it), so if there is a bug in the implementation of the obsolete 
syntax handling it should be fixed.  And yes, there have been other bugs with 
whitespace handling in the parser, unfortunately.

Examples would be most helpful, even if you don't write unit tests.  Most of 
the tests, by the way, are in test__header_value_parser (search for 
message_id).  There aren't very many, so more would be good.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-16 Thread bpoaugust


bpoaugust  added the comment:

The easiest might be for me to provide some test cases, but I have not been 
able to work out where the existing unit tests are.

One failure which I believe should be permitted under current rules is:

 - i.e. trailing space
The space gets added AFTER the >

However the following is parsed correctly:

 - i.e. trailing space but with previous comment

The obsolete rules I referred to are here:
https://datatracker.ietf.org/doc/html/rfc5322#section-4

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-16 Thread Eric V. Smith


Eric V. Smith  added the comment:

In what way is it too strict? What "obsolete rules" are you referring to? What 
are some example Message-Ids should be considered valid that instead get 
truncated? What changes are you proposing?

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46392] MessageIDHeader is too strict for message-id

2022-01-15 Thread bpoaugust


New submission from bpoaugust :

The email headerregistry class MessageIDHeader is too strict when parsing 
existing Message-Ids. It can truncate Message-Ids that are valid according to 
the obsolete rules.

As the saying has it: 
"Be liberal in what you accept, and conservative in what you send."

I think the parser should be much closer to the UnstructuredHeader.

--
components: email
messages: 410665
nosy: barry, bpoaugust, r.david.murray
priority: normal
severity: normal
status: open
title: MessageIDHeader is too strict for message-id

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com