Re: [PATCH v2 09/10] parsemail: Convert to a management command

Daniel Axtens Sun, 28 Aug 2016 00:06:51 -0700

> +    def handle(self, *args, **options):
> +        # Attempt to parse the path if provided, and fallback to stdin if not
> +        if args:
> +            logger.info('Parsing mail loaded by filename')
> +            with open(args[0]) as file_:
> +                mail = message_from_file(file_)
> +        else:
> +            logger.info('Parsing mail loaded from stdin')
> +            mail = message_from_file(sys.stdin)
> +


So, I have found an interesting case here, not strictly related to this
patch but related to parsing messages from files.

I have been testing with some messages from this list from earlier this
month. One [0] includes the following sequence:

000018f0  69 65 73 20 76 69 65 77  29 20 3f c2 a0 20 48 6f  |ies view) ?.. Ho|

Note the sequence "c2 a0". Both these are > 128 and therefore not part
of 7-bit ASCII.

Apparently this is a UTF-8 for a non-breaking space:
http://stackoverflow.com/a/2774507/463510

email.message_from_file does not handle this well: it boils down to

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 6395: 
ordinal not in range(128)

I imagine this hasn't hit us in production because most (all?)
production users use Python2, which doesn't have the bytes/string
distinction that Python3 has.

Anyway, the only way I've found to work around this is to do something
like this:

with open(args[0], 'rb') as file_:
     decoded_mail = file_.read().decode('utf-8')
     mail = email.message_from_string(decoded_mail)

This is super ugly, but works in Py3. Ironically it doesn't work in Py2,
but it's a start. Could you include something like this in this patch
set? I think the parsearchive will require something similar too.

I'm going to start collecting these "interesting" emails to make a test suite.

Regards,
Daniel

[0] https://lists.ozlabs.org/pipermail/patchwork/2016-August/003158.html

signature.asc
Description: PGP signature

_______________________________________________
Patchwork mailing list
[email protected]
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH v2 09/10] parsemail: Convert to a management command

Reply via email to