Re: [PATCH 1/3] parse(mail|archive): handle early fail within email module

2017-07-03 Thread Stephen Finucane
On Mon, 2017-07-03 at 15:32 +1000, Andrew Donnellan wrote:
> On 03/07/17 15:31, Andrew Donnellan wrote:
> > So we should report this bug upstream, but we should also merge this fix
> > to work around it.
> 
> Oh, and this fix is, obviously:
> 
> Reviewed-by: Andrew Donnellan 

Thanks for the providing the context. I've added a note on this and applied it.

Reviewed-by: Stephen Finucane 

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 1/3] parse(mail|archive): handle early fail within email module

2017-07-03 Thread Andrew Donnellan

On 03/07/17 16:04, Daniel Axtens wrote:

I'm up to my eyeballs in kernel networking code, so I'd really
appreciate that.


My envy for your current circumstances is limited...

Will do that this evening.

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 1/3] parse(mail|archive): handle early fail within email module

2017-07-03 Thread Daniel Axtens
Hi Andrew,

> dja - I'm happy to write this up and file it if you haven't already?

I'm up to my eyeballs in kernel networking code, so I'd really
appreciate that.

Regards,
Daniel

>
>
> Andrew
>
>
>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> andrew.donnel...@au1.ibm.com  IBM Australia Limited
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 1/3] parse(mail|archive): handle early fail within email module

2017-07-02 Thread Andrew Donnellan

On 03/07/17 15:31, Andrew Donnellan wrote:

So we should report this bug upstream, but we should also merge this fix
to work around it.


Oh, and this fix is, obviously:

Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 1/3] parse(mail|archive): handle early fail within email module

2017-07-02 Thread Andrew Donnellan

On 01/07/17 14:28, Daniel Axtens wrote:

Certain really messed up email messages can cause a failure within
the email module (at least on py3). Catch this.

Signed-off-by: Daniel Axtens 
---
 patchwork/management/commands/parsearchive.py | 15 +
 patchwork/management/commands/parsemail.py| 31 ---
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/patchwork/management/commands/parsearchive.py 
b/patchwork/management/commands/parsearchive.py
index 4e102a988e76..a9abbbff0d0d 100644
--- a/patchwork/management/commands/parsearchive.py
+++ b/patchwork/management/commands/parsearchive.py
@@ -77,6 +77,21 @@ class Command(BaseCommand):

 count = len(mbox)

+# Iterate through the mbox. This will pick up exceptions that
+# are only thrown when a broken email is found part way
+# through. Without this block, we'd get the exception thrown
+# in enumerate(mbox) below, which is harder to catch.
+#
+# The alternative is converting the mbox to a list of
+# messages, but that requires holding the entire thing in
+# memory, which is wateful.
+try:
+for m in mbox:
+pass
+except AttributeError:
+logger.warning('Broken mbox/Maildir, aborting')
+return
+


So after further discussion with Daniel and taking a look at the test 
case, we've figured out what this is, and why merely iterating through 
this triggers an AttributeError of all things.


This is a bug in the Python email module, triggered when a multipart 
message has a Content-Transfer-Encoding header with a value which can't 
be validly decoded.


A minimal case:

From xyz
Content-Type: multipart/mixed; boundary="1"
	Content-Transfer-Encoding: \200 [where \200 can be any byte that's not 
valid ASCII]


Save that to test.mbox:

import mailbox
mbox = mailbox.mbox('test.mbox')
mbox[0]

and you'll hit this.

mailbox's __getitem__() eventually calls down into email.parser.parse() 
-> email.feedparser.close() -> email.feedparser._call_parse() -> 
email.feedparser._parsegen().


In there, we see:

# Make sure a valid content type was specified per RFC 2045:6.4.
if (self._cur.get('content-transfer-encoding', '8bit').lower()
not in ('7bit', '8bit', 'binary')):
defect = errors.InvalidMultipartContentTransferEncodingDefect()
self.policy.handle_defect(self._cur, defect)

So it looks like self._cur.get() is returning a Header rather than a str 
because it can't be validly decoded, but here we expect it to be a str 
and to have .lower(). It's not, so therefore AttributeError!


Intuitively the correct behaviour would be to handle this case the same 
as any other invalid encoding value and handle the 
InvalidMultipartContentTransferEncodingDefect.


So we should report this bug upstream, but we should also merge this fix 
to work around it.


dja - I'm happy to write this up and file it if you haven't already?


Andrew



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork