[Patches] [ python-Patches-1555570 ] email parser incorrectly breaks headers with a CRLF at 8192

SourceForge.net Tue, 20 Mar 2007 01:23:34 -0800

Patches item #1555570, was opened at 2006-09-10 00:41
Message generated for change (Comment added) made by pmoore
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1555570&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: email parser incorrectly breaks headers with a CRLF at 8192

Initial Comment:
If a message has a CRLF as part of a header that starts
at 8191, the parser will incorrectly consider the
headers to finish at 8191 and the body to start at
8193, which leaves headers in the body of the message.

This problem occurs because the parser reads 8192
characters at a time.  If 8192 is a '\r' and 8193 a
'\n', then when the second block is parsed, it will
appear to be a blank line (i.e. header separator).

The simplest fix for this is to just read an extra
character if the last one read is a '\r'.  This appears
to work fine without breaking anything, although I
suppose that an alternative would be to change the
FeedParser to check whether the '\n' belonged with the
previous data.

A patch and test are attached, against SVN of 10/Sept/06.

----------------------------------------------------------------------

Comment By: Paul Moore (pmoore)
Date: 2007-03-20 09:23

Message:
Logged In: YES 
user_id=113328
Originator: NO

The documentation isn't clear, but I would say that the code is written on
the basis that the file is opened in text or universal newline mode (so
that CRLF will ve translated to \n). On that assumption, the patch is
unnecessary, as CRLF will never appear in the data. (Actually, the
existence of Utils.fix_eols makes me wonder...)

On the other hand, 99% of the time, the code will work fine if a CRLF file
is opened in binary mode. This patch fixes one of the obscure corner cases,
and (as far as I can see) does no real harm - I can't imagine a *real* mail
file containing literal CR bytes which should be preserved!

The documentation should be updated to require that the file is in text or
universal newline mode, and note that files with CRLF opened in binary mode
could be processed incorrectly.

I'm probably +0 on applying this patch, on the basis that it makes the
code more robust given the current lack of any documentation of the
required file mode. If the documentation is patched as above, I'd be -0.

If the rest of the email module is known not to robust in the face of
files with CRLF opened in binary mode, the documentation should definitely
be changed, and that fact made very clear. (This patch becomes irrelevant
then).

BTW, the test case looks wrong - surely the line
'self.assertNotEqual(data[8193], "\n")' should check for a \r? I assume
it's asserting that the line after the break is not the end of the headers
- if so, that line will be \r\n as the earlier translation changes LF to
CRLF.

Summary: CRLF handling is a bit of a can of worms - the module
documentation could do with being clearer in any case, but the patch is
probably good to apply on the basis that it fixes a small corner case
without causing harm elsewhere. The test should be fixed as above.

----------------------------------------------------------------------

Comment By: Jeremy Dunck (jdunck)
Date: 2006-12-06 17:54

Message:
Logged In: YES 
user_id=516425
Originator: NO

We're hitting this bug as well.
It affects Django, which handles media upload over HTTP by parsing
multipart with email.message_from_string.

pegasusnews.com, FWIW.

----------------------------------------------------------------------

Comment By: Graham King (graham_king)
Date: 2006-12-06 11:30

Message:
Logged In: YES 
user_id=1237146
Originator: NO

 I had exactly the same problem parsing a POST encoded as
'multipart/form-data' with lots of parts. The patch provided fixes it. I'm
on ActivePython 2.4.3 so I applied the patch by hand.

 +1 for integrating this patch into Python.

 Thanks,
 Graham King.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1555570&group_id=5470
_______________________________________________
Patches mailing list
[email protected]
http://mail.python.org/mailman/listinfo/patches

[Patches] [ python-Patches-1555570 ] email parser incorrectly breaks headers with a CRLF at 8192

Reply via email to