If there is a non-ascii character in a header, parsing fails,
even on Py27.

Try to decode headers as UTF-8, but if that fails, replace the
offending bytes with a character marking that decoding failed.
See:
https://docs.python.org/3/howto/unicode.html#python-s-unicode-support

This is handy for mails with malformed headers containing weird
bytes.

Reported-by: Thomas Monjalon <thomas.monja...@6wind.com>
Signed-off-by: Daniel Axtens <d...@axtens.net>

---

Many thanks to Thomas for his help debugging this.

Happy to bikeshed whether we want 'replace' or perhaps
'backslashreplace'. Not keen on 'ignore'; it has an interesting
security history - but willing to entertain convincing arguments.

This should probably go to a stable branch too. We'll need to start
some discussion about how to handle bug fixes for people not running
git mainline (like ozlabs.org and kernel.org).

Tests to prevent this recurring to come. Python 3 patches to come
also.
---
 patchwork/parser.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 1805df8cda7f..d3f55634f530 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -157,6 +157,7 @@ def find_date(mail):
 def find_headers(mail):
     return reduce(operator.__concat__,
                   ['%s: %s\n' % (k, Header(v, header_name=k,
+                                           charset='utf-8', errors='replace',
                                            continuation_ws='\t').encode())
                    for (k, v) in list(mail.items())])
 
-- 
2.7.4

_______________________________________________
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Reply via email to