[issue43493] EmailMessage mis-folding headers of a certain length

2021-07-06 Thread R. David Murray


R. David Murray  added the comment:

Ah, yes, the problem is more subtle than I thought.

The design here is that we should be starting with the largest lexical unit, 
seeing if that fits on the current line, or a line by itself, and if so, using 
that, and if not, move down to the next smaller lexical unit and try again, 
until we are finally left with an unbreakable unit.  For unstructured headers 
such as Subject the lexical units should be encoded words followed by blank 
delimited words.  I'm guessing the code is treating the collection of words it 
has accumulated as a unit in the above algorithm, and since it fits on a line 
by itself, it goes with that.  So yeah, it's sort of intentional.

So the bug here is that in your step 2 we ideally want to be considering 
whether the last token on the current line is at the same lexical level as the 
token that precedes it...and if so, and if moving that token to the next line 
lets the remainder fit on the first line, we should do that.  Exactly how to 
implement that correctly is a good question...it's been too long since I wrote 
that code, and I may not have time to investigate it more deeply.

If you come up with something based on my description of the intent above, I 
should be able to review it (though you might need to ping me directly to get 
my attention).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43493] EmailMessage mis-folding headers of a certain length

2021-07-02 Thread Andrei Kulakov


Andrei Kulakov  added the comment:

I've looked into this and it seems to be somewhat intentional, as can be seen 
in this test case for example:

test_headerregistry.py", line 1725, in test_fold_address_list
+ To: "Theodore H. Perfect" ,
+  "My address is very long because my name is long" ,
+  "Only A. Friend" 

Relevant code is here: 
https://github.com/python/cpython/blob/main/Lib/email/_header_value_parser.py#L2829-L2849

The logic goes like this:
tstr = header value
 - try to add it to current line, if fits; continue to start of loop
 - try to add it to next line, if fits; continue to start of loop
 - split tstr; continue to start of loop

So as you can see from test case, if split happened before step 2, the name 
would be split over 2 lines which is not ideal.

I tested splitting before step 2, which fixed this bug but failed 11 test 
cases, all of which deal with email address folding. (To and From headers).

So, is this actually an issue worth fixing?

If the answer is yes, one option would be to special case Subject header and 
split it before going to step 2.

IOW, the block that starts here: 
https://github.com/python/cpython/blob/4bcef2bb48b3fd82011a89c1c716421b789f1442/Lib/email/_header_value_parser.py#L2835

would need to be moved after the next block that starts 6 lines below.

--
nosy: +andrei.avk

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43493] EmailMessage mis-folding headers of a certain length

2021-03-18 Thread R. David Murray


R. David Murray  added the comment:

Parsing and newlines have nothing to do with this bug, actually.  I don't think 
your foldfix post-processing is going to do what you want in the general case.

The source of the bug here is in the folding algorithm in _header_value_parser. 
 It has checks to see if the "text so far" will fit within the header width, 
and it starts a new line under vafious conditions.  For example, if there is a 
single word after Subject: whose length is, say, 70, it would produce the 
effect you show, because the single word would fit without folding or encoding 
on a new line.  I don't think this violates the RFC.  What your example shows 
makes it look like the folder is treating all of the text as if it were a 
single word, which is obviously wrong.  It is supposed to break at spaces.  You 
will note that if you increase the repeat count in your example to 16 it folds 
the line correctly.  So the bug has something to do with the total text so far 
accumulated for the line being right in that window where it won't fit on the 
first line but does fit on a line by itself.  This is obviously a bug in the 
folder, since it should be splitting that text if it isn't a sin
 gle word, not moving it to a new line as a whole.

Note that this bug is still present on master.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43493] EmailMessage mis-folding headers of a certain length

2021-03-17 Thread Mike Glover


Mike Glover  added the comment:

Further research shows that email.parser.Parser is not handling the affected 
lines correctly -- the leading '\n ' is not being stripped from the header 
value.

Attached is the (ugly, worksforme) function I'm using to workaround this problem

--
Added file: https://bugs.python.org/file49889/foldfix.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43493] EmailMessage mis-folding headers of a certain length

2021-03-14 Thread Mike Glover


New submission from Mike Glover :

The attached file demonstrates the incorrect folding behavior I'm seeing.  
Header lines of a certain total length get folded after the colon following the 
header name, which is not valid RFC.  Slightly longer or shorter lines are 
folded correctly.

Interestingly, the test file produces correct output on 3.5.2

$ python --version
Python 3.8.5

$ sudo apt install python3
...
python3 is already the newest version (3.8.2-0ubuntu2).

(yes, that difference has me scratching my head)

And yes, I realize this is not the latest release of the 3.8 branch, but it 
*is* the latest available through apt on Ubuntu 20.04 LTS, and a search of the 
issue tracker and the release notes for all of 3.8.* turned up nothing 
applicable.

--
components: email
files: header_misfolding.py
messages: 388687
nosy: barry, mglover, r.david.murray
priority: normal
severity: normal
status: open
title: EmailMessage mis-folding headers of a certain length
type: behavior
versions: Python 3.8
Added file: https://bugs.python.org/file49875/header_misfolding.py

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com