On 29 Sep 09:01, Daniel Axtens wrote: > >> +def sanitise_header(header_contents, header_name=None): > >> + """Given a header with header_contents, optionally labelled > >> + header_name, decode it with decode_header, sanitise it to make > >> + sure it decodes correctly and contains no invalid characters. > >> + Then encode the result with make_header() > >> + """ > >> + > >> + # We have some Py2/Py3 issues here. > >> + # > >> + # Firstly, the email parser (before we get here) > >> + # Python 3: headers with weird chars are email.header.Header > >> + # class, others as str > >> + # Python 2: every header is an str > >> + # > >> + # Secondly, the behaviour of decode_header: > >> + # Python 3: weird headers are labelled as unknown-8bit > >> + # Python 2: weird headers are not labelled differently > >> + # > >> + # Lastly, aking matters worse, in Python2, unknown-8bit doesn't > >> + # seem to be supported as an input to make_header, so not only do > >> + # we have to detect dodgy headers, we have to fix them ourselves. > >> + # > >> + # We solve this by catching any Unicode errors, and then manually > >> + # handling any interesting headers. > > > > I'm going to move all the above into the docstring, if that's OK by > > you. > > I don't mind, but my understanding was that the docstring described the > function, rather than the implementation details, hence why I had it in > a comment originally.
You're right. Let's keep this as is. > >> + # python2 - no support in make_header for unknown-8bit > >> + # We should do unknown-8bit coding ourselves. > >> + # For now, we're just going to replace any dubious > >> + # chars with ?. > >> + # > >> + # TODO: replace it with a proper QP unknown-8bit codec. > > > > How could we resolve this TODO in the future? > > > > I think we use email.charset. Possibly we can copy some of the Py3 code > that implements unknown-8bit. It all looks quite finnicky. Also, any > time we get a message with an invalid header and we're relying on the > patchwork header display to figure it out is an edge case on top of > another edge case, so I didn't think it worth much more time. Yeah, we can solve this if it comes up. Just good to see that there is a potential resolution if needed. > >> + # on Py2, we want to do unicode(), on Py3, str(). > >> + # That gets us the decoded, un-wrapped header. > >> + if six.PY2: > >> + header_str = unicode(sane_header) > >> + else: > >> + header_str = str(sane_header) > > > > nit: this looks like something we could use the 'six.u()' function for? > > > > https://pythonhosted.org/six/#six.u > > > I thought this too. Then I looked at the code for django.utils.six: > https://github.com/django/django/blob/master/django/utils/six.py#L646 > > > def u(s): > return unicode(s.replace(r'\\', r'\\\\'), "unicode_escape") > > That won't work because we're passing in a header class to u(). The > header class, as far as I know, doesn't provide .replace(), and even if > it did it's not clear we want to call it. Good spot. Thanks for checking this out. > >> + strings = [('%s: %s' % (key, header.encode())) > > > > How come '.encode' is suitable here, yet we need to call 'unicode'/'str' > > in the 'clean_header' function? > > Because the header class overrides .encode(), __unicode__ and __str__, > and they do different things. > > For headers: > > - str (py3) and unicode (py2) provides a non-line-wrapped version of > the header, in unicode - that is, not in quoted-printable or base64 > 7-bit form. > > - encode provides a line-wrapped, 7-bit encoded form of the header. > > For find_headers, we're creating the headers that are displayed when you > click on the 'show' link on the patch display, under Details, and just > beneath the message id and state. These are supposed to be fully encoded > and line wrapped: they're supposed to look like mail headers. > > For clean_header, we're looking for the human readable form: decoded > From quoted printable or base64, converted from whatever charset they > were in, and not wrapped. > > Hence the code. Admittedly this is not super clear, but I blame the API > of the email.Header module. Ah, I understand you now. I'll stick some of those comments in there. This looks good now so, hence: Reviewed-by: Stephen Finucane <[email protected]> and applied. _______________________________________________ Patchwork mailing list [email protected] https://lists.ozlabs.org/listinfo/patchwork
