Patches item #1555570, was opened at 2006-09-10 00:41 Message generated for change (Comment added) made by pmoore You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1555570&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tony Meyer (anadelonbrin) Assigned to: Barry A. Warsaw (bwarsaw) Summary: email parser incorrectly breaks headers with a CRLF at 8192 Initial Comment: If a message has a CRLF as part of a header that starts at 8191, the parser will incorrectly consider the headers to finish at 8191 and the body to start at 8193, which leaves headers in the body of the message. This problem occurs because the parser reads 8192 characters at a time. If 8192 is a '\r' and 8193 a '\n', then when the second block is parsed, it will appear to be a blank line (i.e. header separator). The simplest fix for this is to just read an extra character if the last one read is a '\r'. This appears to work fine without breaking anything, although I suppose that an alternative would be to change the FeedParser to check whether the '\n' belonged with the previous data. A patch and test are attached, against SVN of 10/Sept/06. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2007-03-20 09:23 Message: Logged In: YES user_id=113328 Originator: NO The documentation isn't clear, but I would say that the code is written on the basis that the file is opened in text or universal newline mode (so that CRLF will ve translated to \n). On that assumption, the patch is unnecessary, as CRLF will never appear in the data. (Actually, the existence of Utils.fix_eols makes me wonder...) On the other hand, 99% of the time, the code will work fine if a CRLF file is opened in binary mode. This patch fixes one of the obscure corner cases, and (as far as I can see) does no real harm - I can't imagine a *real* mail file containing literal CR bytes which should be preserved! The documentation should be updated to require that the file is in text or universal newline mode, and note that files with CRLF opened in binary mode could be processed incorrectly. I'm probably +0 on applying this patch, on the basis that it makes the code more robust given the current lack of any documentation of the required file mode. If the documentation is patched as above, I'd be -0. If the rest of the email module is known not to robust in the face of files with CRLF opened in binary mode, the documentation should definitely be changed, and that fact made very clear. (This patch becomes irrelevant then). BTW, the test case looks wrong - surely the line 'self.assertNotEqual(data[8193], "\n")' should check for a \r? I assume it's asserting that the line after the break is not the end of the headers - if so, that line will be \r\n as the earlier translation changes LF to CRLF. Summary: CRLF handling is a bit of a can of worms - the module documentation could do with being clearer in any case, but the patch is probably good to apply on the basis that it fixes a small corner case without causing harm elsewhere. The test should be fixed as above. ---------------------------------------------------------------------- Comment By: Jeremy Dunck (jdunck) Date: 2006-12-06 17:54 Message: Logged In: YES user_id=516425 Originator: NO We're hitting this bug as well. It affects Django, which handles media upload over HTTP by parsing multipart with email.message_from_string. pegasusnews.com, FWIW. ---------------------------------------------------------------------- Comment By: Graham King (graham_king) Date: 2006-12-06 11:30 Message: Logged In: YES user_id=1237146 Originator: NO I had exactly the same problem parsing a POST encoded as 'multipart/form-data' with lots of parts. The patch provided fixes it. I'm on ActivePython 2.4.3 so I applied the patch by hand. +1 for integrating this patch into Python. Thanks, Graham King. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1555570&group_id=5470 _______________________________________________ Patches mailing list Patches@python.org http://mail.python.org/mailman/listinfo/patches