Hi, Stephen's suggested patch is a bit better here, so drop this for now. v2, tests, etc. to come.
Regards, Daniel > If there is a non-ascii character in a header, parsing fails, > even on Py27. > > Try to decode headers as UTF-8, but if that fails, replace the > offending bytes with a character marking that decoding failed. > See: > https://docs.python.org/3/howto/unicode.html#python-s-unicode-support > > This is handy for mails with malformed headers containing weird > bytes. > > Reported-by: Thomas Monjalon <thomas.monja...@6wind.com> > Signed-off-by: Daniel Axtens <d...@axtens.net> > > --- > > Many thanks to Thomas for his help debugging this. > > Happy to bikeshed whether we want 'replace' or perhaps > 'backslashreplace'. Not keen on 'ignore'; it has an interesting > security history - but willing to entertain convincing arguments. > > This should probably go to a stable branch too. We'll need to start > some discussion about how to handle bug fixes for people not running > git mainline (like ozlabs.org and kernel.org). > > Tests to prevent this recurring to come. Python 3 patches to come > also. > --- > patchwork/parser.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/patchwork/parser.py b/patchwork/parser.py > index 1805df8cda7f..d3f55634f530 100644 > --- a/patchwork/parser.py > +++ b/patchwork/parser.py > @@ -157,6 +157,7 @@ def find_date(mail): > def find_headers(mail): > return reduce(operator.__concat__, > ['%s: %s\n' % (k, Header(v, header_name=k, > + charset='utf-8', errors='replace', > continuation_ws='\t').encode()) > for (k, v) in list(mail.items())]) > > -- > 2.7.4 _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork