Feature Requests item #795081, was opened at 2003-08-25 23:37 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Stuart D. Gathman (customdesigned) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: email.Message param parsing problem II Initial Comment: The enclosed real life (inactivated) virus message causes email.Message to fail to find the multipart attachments. This is because the headers following Content-Type are indented, causing email.Message to properly append them to Content-Type. The trick is that the boundary is quoted, and Outhouse^H^H^H^H^Hlook apparently gets a value of 'bound' for boundary, whereas email.Message gets the value '"bound"\n\tX-Priority...'. email.Utils.unqoute apparently gives up and doesn't remove any quotes. I believe that unqoute should return just what is between the quotes, so that '"abc" def' would be unquoted to 'abc'. In fact, my email filtering software (http://bmsi.com/python/milter.html) works correctly on all kinds of screwy mail using my version of unquote using this heuristic. I believe that header used by the virus is invalid, so a STRICT parser should reject it, but a tolerant parser (such as a virus scanner would use) should use the heuristic. Here is a brief script to show the problem (attached file in test/virus5): ----------t.py---------- import email msg = email.message_from_file(open('test/virus5','r')) print msg.get_params() --------------------- $ python2 t.py [('multipart/mixed', ''), ('boundary', '"bound"\n\tX-Priority: 3\n\tX-MSMail-Priority: Normal\n\tX-Mailer: Microsoft Outlook Express 5.50.4522.1300\n\tX-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1300')] ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-03-30 10:58 Message: Logged In: YES user_id=1344176 Originator: NO I'm still seeing this behaviour as of Python 2.6a0. Barry: I take it email-sig didn't get around to discussing this? ---------------------------------------------------------------------- Comment By: Barry A. Warsaw (bwarsaw) Date: 2003-11-21 15:45 Message: Logged In: YES user_id=12800 Moving this to feature requests for Python 2.4. If appropriate, the email-sig should address this in the intended new lax parser for email 3.0 / Python 2.4. We can't add this to the Python 2.3 (or earlier) maintenance releases. ---------------------------------------------------------------------- Comment By: Stuart D. Gathman (customdesigned) Date: 2003-08-25 23:57 Message: Logged In: YES user_id=142072 Here is a proposed fix for email.Util.unquote (except it should test for a 'strict' mode flag, which is current only in Parser): def unquote(str): """Remove quotes from a string.""" if len(str) > 1: if str.startswith('"'): if str.endswith('"'): str = str[1:-1] else: # remove garbage after trailing quote try: str = str[1:str[1:].index('"')+1] except: return str return str.replace('\\\\', '\\').replace('\\"', '"') if str.startswith('<') and str.endswith('>'): return str[1:-1] return str Actually, I replaced only email.Message._unquotevalue for my application to minimize the impact. That would also be a good place to check for a STRICT flag stored with the message object. Perhaps the Parser should set the Message _strict flag from its own _strict flag. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=795081&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
