[issue30701] Exception parsing certain invalid email address headers

2019-09-12 Thread Tim Bell


Tim Bell  added the comment:

This appears to be the same issue as subsequently reported in #34155.

--

___
Python tracker 
<https://bugs.python.org/issue30701>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2018-11-28 Thread Tim Bell


Tim Bell  added the comment:

I've addressed the points in the last few comments and created a new PR (10783).

--

___
Python tracker 
<https://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2018-11-28 Thread Tim Bell


Change by Tim Bell :


--
keywords: +patch
pull_requests: +10025
stage:  -> patch review

___
Python tracker 
<https://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30988] Exception parsing invalid email address headers starting or ending with dot

2017-07-21 Thread Tim Bell

Changes by Tim Bell <timothyb...@gmail.com>:


--
pull_requests: +2862

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30988>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30988] Exception parsing invalid email address headers starting or ending with dot

2017-07-21 Thread Tim Bell

New submission from Tim Bell:

Email addresses with a display name starting with a dot ("."), or ending with a 
dot without whitespace before the angle bracket trigger exceptions when 
accessing the header, after creating the message object with the "default" 
policy.

For example:

>>> import email
>>> from email.policy import default
>>> email.message_from_bytes(b'To: . Doe <j...@example.com>')['to']
'. Doe <j...@example.com>'
>>> email.message_from_bytes(b'To: . Doe <j...@example.com>', 
>>> policy=default)['to']
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/bhat/git/cpython/Lib/email/message.py", line 391, in __getitem__
return self.get(name)
  File "/Users/bhat/git/cpython/Lib/email/message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
  File "/Users/bhat/git/cpython/Lib/email/policy.py", line 162, in 
header_fetch_parse
return self.header_factory(name, value)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 586, in 
__call__
return self[name](name, value)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 197, in 
__new__
cls.parse(value, kwds)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in parse
for mb in addr.all_mailboxes]))
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in 

for mb in addr.all_mailboxes]))
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 834, 
in display_name
return self[0].display_name
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 768, 
in display_name
return self[0].display_name
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 931, 
in display_name
if res[0][0].token_type == 'cfws':
AttributeError: 'str' object has no attribute 'token_type'
>>>
>>> email.message_from_bytes(b'To: John X.<j...@example.com>')['to']
'John X.<j...@example.com>'
>>> email.message_from_bytes(b'To: John X.<j...@example.com>', 
>>> policy=default)['to']
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/bhat/git/cpython/Lib/email/message.py", line 391, in __getitem__
return self.get(name)
  File "/Users/bhat/git/cpython/Lib/email/message.py", line 471, in get
return self.policy.header_fetch_parse(k, v)
  File "/Users/bhat/git/cpython/Lib/email/policy.py", line 162, in 
header_fetch_parse
return self.header_factory(name, value)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 586, in 
__call__
return self[name](name, value)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 197, in 
__new__
cls.parse(value, kwds)
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in parse
for mb in addr.all_mailboxes]))
  File "/Users/bhat/git/cpython/Lib/email/headerregistry.py", line 344, in 

for mb in addr.all_mailboxes]))
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 834, 
in display_name
return self[0].display_name
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 768, 
in display_name
return self[0].display_name
  File "/Users/bhat/git/cpython/Lib/email/_header_value_parser.py", line 936, 
in display_name
if res[-1][-1].token_type == 'cfws':
AttributeError: 'str' object has no attribute 'token_type'

--
components: email
messages: 298836
nosy: barry, r.david.murray, timb07
priority: normal
severity: normal
status: open
title: Exception parsing invalid email address headers starting or ending with 
dot
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30988>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30701] Exception parsing certain invalid email address headers

2017-06-19 Thread Tim Bell

Tim Bell added the comment:

I'm using the email package to ingest a firehose of spam; spammers aren't known 
for following norms or standards, so it's not surprising that I'm discovering 
lots of edge cases.

I'll supply fixes for what I find where I can, time permitting.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30701>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30701] Exception parsing certain invalid email address headers

2017-06-19 Thread Tim Bell

New submission from Tim Bell:

According to RFC 5322, an email address like this isn't valid:

u...@example.com <u...@example.com>

(The display-name "u...@example.com" contains "@", which isn't in the set of 
atext characters used to form an atom.)

How it's handled by the email package varies by policy:

>>> import email
>>> from email.policy import default
>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>')['to']
'u...@example.com <u...@example.com>'
>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>', 
>>> policy=default)['to']
'u...@example.com'
>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com>', 
>>> policy=default).defects
[]

The difference between the behaviour under the compat32 vs "default" policy may 
or may not be significant.

However, if coupled with a further invalid feature, namely a space after the 
">", here's what happens:

>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ')['to']
'u...@example.com <u...@example.com> '
>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ', 
>>> policy=default)['to']
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py",
 line 391, in __getitem__
return self.get(name)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/message.py",
 line 471, in get
return self.policy.header_fetch_parse(k, v)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/policy.py",
 line 162, in header_fetch_parse
return self.header_factory(name, value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py",
 line 586, in __call__
return self[name](name, value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py",
 line 197, in __new__
cls.parse(value, kwds)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py",
 line 337, in parse
kwds['parse_tree'] = address_list = cls.value_parser(value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/headerregistry.py",
 line 328, in value_parser
address_list, value = parser.get_address_list(value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py",
 line 2368, in get_address_list
token, value = get_invalid_mailbox(value, ',')
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py",
 line 2166, in get_invalid_mailbox
token, value = get_phrase(value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py",
 line 1770, in get_phrase
token, value = get_word(value)
  File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/_header_value_parser.py",
 line 1745, in get_word
if value[0]=='"':
IndexError: string index out of range
>>> email.message_from_bytes(b'To: u...@example.com <u...@example.com> ', 
>>> policy=default).defects
[]

I believe that the preferred behaviour would be to add a defect to the message 
object during parsing instead of throwing an exception when the invalid header 
value is accessed.

--
components: email
messages: 296309
nosy: barry, r.david.murray, timb07
priority: normal
severity: normal
status: open
title: Exception parsing certain invalid email address headers
type: behavior
versions: Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30701>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-17 Thread Tim Bell

Tim Bell added the comment:

I've updated the pull request to incorporate Barry's suggestion of a new defect 
for this situation, InvalidDateDefect.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-16 Thread Tim Bell

Tim Bell added the comment:

Thanks for the feedback. I've made a new pull request which addresses the 
points raised.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-16 Thread Tim Bell

Changes by Tim Bell <timothyb...@gmail.com>:


--
pull_requests: +2304

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-15 Thread Tim Bell

Tim Bell added the comment:

My proposed solution (in https://github.com/python/cpython/pull/2229) is 
two-part:

1. change parsedate_to_datetime() to return None rather than raising an 
exception; and

2. change headerregistry.DateHeader.parse() to check for None being returned 
from parsedate_to_datetime(), and to add a defect; the datetime attribute is 
set to None (as if the Date header were missing), but the header still 
evaluates as a string to the supplied header value.

I'm not sure what the use case is for distinguishing between a missing Date 
header and an invalid date value, but can't that be distinguished by the 
different defects added to the header?

In any case, if I'm not fully grasping the context and parsedate_to_datetime() 
should continue to throw exceptions, then a slightly different modification to 
DateHeader to catch those exceptions would seem sensible, and would address my 
use case.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-15 Thread Tim Bell

Changes by Tim Bell <timothyb...@gmail.com>:


--
pull_requests: +2274

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30681] email.utils.parsedate_to_datetime() should return None when date cannot be parsed

2017-06-15 Thread Tim Bell

New submission from Tim Bell:

Python 3.6 documentation for email.utils.parsedate_to_datetime() says "Performs 
the same function as parsedate(), but on success returns a datetime." The docs 
for parsedate() say "If it succeeds in parsing the date...; otherwise None will 
be returned." By implication, parsedate_to_datetime() should return None when 
the date can't be parsed.

There are two different failure modes for parsedate_to_datetime():

1. When _parsedate_tz() fails to parse the date and returns None:

>>> from email.utils import parsedate_to_datetime
>>> parsedate_to_datetime('0')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/email/utils.py", line 210, in parsedate_to_datetime
*dtuple, tz = _parsedate_tz(data)
TypeError: 'NoneType' object is not iterable


2. When _parsedate_tz() succeeds, but conversion to datetime.datetime fails:

>>> parsedate_to_datetime('Tue, 06 Jun 2017 27:39:33 +0600')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python3.6/email/utils.py", line 214, in parsedate_to_datetime
tzinfo=datetime.timezone(datetime.timedelta(seconds=tz)))
ValueError: hour must be in 0..23


Note that this second case is the one that led me to this issue. I am using the 
email package to parse spam emails for subsequent analysis, and a certain group 
of spam emails contain invalid hour fields in their Date header. I don't 
require the invalid Date header to be converted to a datetime.datetime, but 
accessing email_message['date'] to access the header value as a string triggers 
the ValueError exception. I can work around this with a custom email policy, 
but the observed behaviour does seem to contradict the documented behaviour.

Also, in relation to https://bugs.python.org/issue15925, r.david.murray 
commented "Oh, and I'm purposely allowing parsedate_to_datetime throw 
exceptions.  I suppose that should be documented, but that's a separate issue." 
However, no argument for why parsedate_to_datetime throwing exceptions is 
desired was given.

--
components: email
messages: 296137
nosy: barry, r.david.murray, timb07
priority: normal
severity: normal
status: open
title: email.utils.parsedate_to_datetime() should return None when date cannot 
be parsed
type: behavior
versions: Python 3.5, Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com