If there is a non-ascii character in a header, parsing fails, even on Py27.
Try to decode headers as UTF-8, but if that fails, replace the offending bytes with a character marking that decoding failed. See: https://docs.python.org/3/howto/unicode.html#python-s-unicode-support This is handy for mails with malformed headers containing weird bytes. Reported-by: Thomas Monjalon <[email protected]> Signed-off-by: Daniel Axtens <[email protected]> --- Many thanks to Thomas for his help debugging this. Happy to bikeshed whether we want 'replace' or perhaps 'backslashreplace'. Not keen on 'ignore'; it has an interesting security history - but willing to entertain convincing arguments. This should probably go to a stable branch too. We'll need to start some discussion about how to handle bug fixes for people not running git mainline (like ozlabs.org and kernel.org). Tests to prevent this recurring to come. Python 3 patches to come also. --- patchwork/parser.py | 1 + 1 file changed, 1 insertion(+) diff --git a/patchwork/parser.py b/patchwork/parser.py index 1805df8cda7f..d3f55634f530 100644 --- a/patchwork/parser.py +++ b/patchwork/parser.py @@ -157,6 +157,7 @@ def find_date(mail): def find_headers(mail): return reduce(operator.__concat__, ['%s: %s\n' % (k, Header(v, header_name=k, + charset='utf-8', errors='replace', continuation_ws='\t').encode()) for (k, v) in list(mail.items())]) -- 2.7.4 _______________________________________________ Patchwork mailing list [email protected] https://lists.ozlabs.org/listinfo/patchwork
