[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-29 Thread miss-islington


miss-islington  added the comment:


New changeset a6ae02d7e91cfe63c9b65b803ae24a40d2864bc0 by Miss Islington (bot) 
in branch '3.9':
bpo-39040: Fix parsing of email mime headers with whitespace between 
encoded-words. (gh-17620)
https://github.com/python/cpython/commit/a6ae02d7e91cfe63c9b65b803ae24a40d2864bc0


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-29 Thread miss-islington


miss-islington  added the comment:


New changeset 6381ee077d3c69d2f947f7bf87d8ec76e0caf189 by Miss Islington (bot) 
in branch '3.8':
bpo-39040: Fix parsing of email mime headers with whitespace between 
encoded-words. (gh-17620)
https://github.com/python/cpython/commit/6381ee077d3c69d2f947f7bf87d8ec76e0caf189


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-29 Thread miss-islington


miss-islington  added the comment:


New changeset 5f977e09e8a29dbd5972ad79c4fd17a394d1857f by Miss Islington (bot) 
in branch '3.7':
bpo-39040: Fix parsing of email mime headers with whitespace between 
encoded-words. (gh-17620)
https://github.com/python/cpython/commit/5f977e09e8a29dbd5972ad79c4fd17a394d1857f


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-28 Thread R. David Murray


R. David Murray  added the comment:


New changeset 21017ed904f734be9f195ae1274eb81426a9e776 by Abhilash Raj in 
branch 'master':
bpo-39040: Fix parsing of email mime headers with whitespace between 
encoded-words. (gh-17620)
https://github.com/python/cpython/commit/21017ed904f734be9f195ae1274eb81426a9e776


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-28 Thread miss-islington


Change by miss-islington :


--
pull_requests: +19752
pull_request: https://github.com/python/cpython/pull/20506

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-28 Thread miss-islington


Change by miss-islington :


--
pull_requests: +19751
pull_request: https://github.com/python/cpython/pull/20505

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2020-05-28 Thread miss-islington


Change by miss-islington :


--
nosy: +miss-islington
nosy_count: 4.0 -> 5.0
pull_requests: +19750
pull_request: https://github.com/python/cpython/pull/20504

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-24 Thread Abhilash Raj


Abhilash Raj  added the comment:

I double checked, there should be 4 commits in the PR and last 2 have the 
changes that you asked for in the test case and NEWS entry.

Your previous comment will point at the old diff, you might have to look at the 
full diff here: https://github.com/python/cpython/pull/17620/files or if you 
want, this is the diff for the 2 commits with the changes you requested: 
https://github.com/python/cpython/pull/17620/files/bf2cb76009d72869d9df6550b473b5818ceab311..016ceb3ef00b3b940993d35d539ce63d68437d4f

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-24 Thread R. David Murray


R. David Murray  added the comment:

I don't see the change to the test in the PR.  Did you miss a push or is github 
doing something wonky with the review?  (I haven't used github review in a 
while and I had forgetten how hard it is to use...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-17 Thread Abhilash Raj


Abhilash Raj  added the comment:

Sure, fixed as per your comments in the PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-17 Thread R. David Murray


R. David Murray  added the comment:

One more tweak to the test and we'll be good to go.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-16 Thread Abhilash Raj


Abhilash Raj  added the comment:

Thanks David! I applied the fixes as per  your comments, can you please take 
another look?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-16 Thread R. David Murray


R. David Murray  added the comment:

In general your solution looks good, just a few naming comments and an 
additional test request.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-15 Thread Abhilash Raj


Abhilash Raj  added the comment:

Thanks for the pointer, David! I created a PR for the fix, would you be able to 
review it please?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-15 Thread Abhilash Raj


Change by Abhilash Raj :


--
pull_requests: +17090
pull_request: https://github.com/python/cpython/pull/17620

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-15 Thread R. David Murray


R. David Murray  added the comment:

The example you want to look at is get_unstructured.  That shows both lookback 
and modification of the parse tree to handle the whitespace between encoded 
words.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-15 Thread Abhilash Raj

Abhilash Raj  added the comment:

I tried to take a look at the code to see where the fix needs to be and I 
probably need some help.

I looked at the parse tree for the header and it looks something like this:

ContentDisposition([Token([ValueTerminal('attachment')]), ValueTerminal(';'), 
MimeParameters([Parameter([Attribute([CFWSList([WhiteSpaceTerminal(' ')]), 
ValueTerminal('filename')]), ValueTerminal('='), 
Value([QuotedString([BareQuotedString([EncodedWord([ValueTerminal('Schulbesuchsbestättigung.')]),
 WhiteSpaceTerminal(''), EncodedWord([ValueTerminal('pdf')])])])])])])])


The offending piece of code, which seems to be working as designed is 
get_bare_quoted_string() in email/_header_value_parser.py. 

while value and value[0] != '"':
if value[0] in WSP:
token, value = get_fws(value)
elif value[:2] == '=?':
try:
token, value = get_encoded_word(value)
bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
"encoded word inside quoted string"))
except errors.HeaderParseError:
token, value = get_qcontent(value)
else:
token, value = get_qcontent(value)
bare_quoted_string.append(token)

It just loops and parses the values. We cannot ignore the FWS until we know 
that the atom before and after the FWS are encoded words. I can't seem to find 
a clean way to look-ahead (which can perhaps be used in get_parameters()) or 
look-back (which can be used after parsing the entire bare_quoted_string?) in 
the parse tree to delete the offending whitespace. 

Any example of such kind of parse-tree manipulation in the code base would be 
awesome!

--
versions: +Python 3.9 -Python 3.5, Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-14 Thread Karthikeyan Singaravelan


Change by Karthikeyan Singaravelan :


--
nosy: +maxking

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-14 Thread R. David Murray


R. David Murray  added the comment:

And you are right that this is a very common bug in email programs.  So common 
that I suspect the RFC folks will eventually have to accept it as a de-facto 
standard.  So we do need to support it in the python email library.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-14 Thread R. David Murray


R. David Murray  added the comment:

Yes, google should fix their bug.  However, the python email package tries very 
hard to interpret even RFC-non-compliant emails when there is a way to do so.  
As I said, the package already tries to interpret headers such as google is 
generating, it's just that there is a bug in that interpretation: it is keeping 
the blank between then encoded words when it should not be.  That bug can be 
fixed, in get_raw_encoded_word and/or get_parameter, in 
email._header_value_parser.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-14 Thread Manfred Kaiser


Manfred Kaiser  added the comment:

as you mentioned, rfc-2047 forbidds encoded words in quoted strings.

Source: https://tools.ietf.org/html/rfc2047 - Chapter 5/3

I have tested a few web mail clients and they have the same issue. According to 
the RFCs, this is not allowed, but I think it is widely used.

Should we fix this problem?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Manfred Kaiser  added the comment:

RFC-2184 was obsoleted by RFC-2231 (https://www.rfc-editor.org/rfc/rfc2231.html)

There are also no quoted strings, like google uses.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Manfred Kaiser  added the comment:

thanks for your response. I have found the RFC 
https://tools.ietf.org/html/rfc2184

Gmail creates wrong Headers, which are not rfc-compliant.
The problem is, that many people are using gmail and emails, which were sent 
from Gmail might be wrong.

How can we solve this problem? It is not a Python problem. We can create 
workarrounds. But in my opinion Google has to fix the bug.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread R. David Murray


R. David Murray  added the comment:

That header is *completely* non-RFC compliant.  If gmail generated that header 
there is something very wrong in google-land :(

The RFC compliant formatting for that header looks like this:

Content-Disposition: attachment;
 filename*=utf-8''Schulbesuchsbest%C3%A4ttigung.pdf

You will note that this is nothing like encoded word format.  Encoded words are 
not valid inside quoted strings, and quoted strings can't be used in mime 
header attributes if there are non-ascii characters involved.  Nor can encoded 
words.  

Now, all that said, there is an obvious rule that can be followed to understand 
what that header is trying to convey, and the current parser already implements 
most of it (you will find comments about it in the parser, as well as defects 
being registered).  So, a patch to _header_value_parser to fix the error 
recovery will be accepted.  I've looked at the code to remind myself, but not 
deeply enough to be *sure* where the changes need to be made.  There are two 
possibilities I see off the bat (and both may need fixing): 
get_bare_quoted_string and get_parameter.  Either one or both of those may be 
forgetting that whitespace between encoded words should be dropped.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Change by Manfred Kaiser :


Added file: https://bugs.python.org/file48777/original_mail_from_gmail.eml

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Manfred Kaiser  added the comment:

The mail was sent from the GMail web interface

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser

Manfred Kaiser  added the comment:

The original filename is "Schulbesuchsbestättigung.pdf", but when I use the 
method "get_filename" I got "Schulbesuchsbestättigung.pdf"

I removed some headers from the mail for privacy reasons

--
Added file: https://bugs.python.org/file48776/error.eml

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Change by Manfred Kaiser :


Added file: https://bugs.python.org/file48775/testscript.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread R. David Murray


R. David Murray  added the comment:

Thanks for the report.  Can you provide an example that reproduces the problem? 
 

Per the RFC, lines may be broken before whitespace in certain places in certain 
headers, but that does not make the whitespace go away.  Only the crlf sequence 
is removed when unfolding the header, per the RFC, so your proposed fix is 
incorrect. I suspect your example header is invalid, and the question will then 
become is there some sort of Postel-style error recovery we can and want to do 
in the function that parses the content-disposition header.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39040] Wrong attachement filename when mail mime header was too long

2019-12-13 Thread Manfred Kaiser


Change by Manfred Kaiser :


--
title: Wrong filename in mail when mime header was too long -> Wrong 
attachement filename when mail mime header was too long

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com