[issue28945] get_boundary invokes unquote twice

Eric Lafontaine Mon, 19 Dec 2016 13:38:55 -0800

Eric Lafontaine added the comment:

Hi all,


I believe this is the right behavior and what ever generated the boundary 
"<<>>" is the problem ; 


RFC  2046 page 22:
_____________________
The only mandatory global parameter for the "multipart" media type is the 
boundary parameter, which consists of 1 to 70 characters from a set of 
characters known to be very robust through mail gateways, and NOT ending with 
white space. (If a boundary delimiter line appears to end with white space, the 
white space must be presumed to have been added by a gateway, and must be 
deleted.)  It is formally specified by the following BNF:

     boundary := 0*69<bchars> bcharsnospace

     bchars := bcharsnospace / " "

     bcharsnospace := DIGIT / ALPHA / "'" / "(" / ")" /
                      "+" / "_" / "," / "-" / "." /
                      "/" / ":" / "=" / "?"
_____________________
In other words, the only valid boundaries characters are :
01234567890 abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'()+_,-./:=?

Any other character should be removed to get the boundary right.  I believe the 
issue is that it wasn't removed in the first place.  It is a bug in my opinion, 
but the other way around :).

Funny thing is that the unquote function only remove the first&last character 
it sees... either '<' and the '"'...

def unquote(str):
    """Remove quotes from a string."""
    if len(str) > 1:
        if str.startswith('"') and str.endswith('"'):
            return str[1:-1].replace('\\\\', '\\').replace('\\"', '"')
        if str.startswith('<') and str.endswith('>'):
            return str[1:-1]
    return str

Now, if I modify unquote to only keep the list of character above, would I 
break something? Probably. 
I haven't found any other defining RFC about boundaries that tells me what was 
the format supported.  Can someone help me on that?

This is what the function should look like :

import string
def get_boundary(str):
    """ return the valid boundary parameter as per RFC 2046 page 22. """
    if len(str) > 1:
        import re
        return re.sub('[^'+
                      string.ascii_letters +
                      string.digits +
                      r""" '()+_,-./:=?]|="""
                      ,'',str
                      ).rstrip(' ')
    return str

import unittest

class boundary_tester(unittest.TestCase):
    def test_get_boundary(self):
        boundary1 = """ abc def gh< 123 >!@ %!%' """
        ref_boundary1 = """ abc def gh 123  '""" # this is the valid Boundary
        ret_value = get_boundary(boundary1)
        self.assertEqual(ret_value,ref_boundary1)

    def test_get_boundary2(self):
        boundary1 = ''.join((' ',string.printable))
        ref_boundary1 = ' 
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ\'()+,-./:?_' # 
this is the valid Boundary
        ret_value = get_boundary(boundary1)
        self.assertEqual(ret_value,ref_boundary1)


I believe this should be added to the email.message.Message get_boundary 
function.  

Regards,
Eric Lafontaine

----------
nosy: +Eric Lafontaine

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28945>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue28945] get_boundary invokes unquote twice

Reply via email to