Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > The db limit is 255: we need to make sure we don't pass anything > longer in or it will throw an exception. > > Signed-off-by: Daniel Axtens We can fix the other fields in a separate patch. This one looks good to me.

Re: [PATCH RFC] events-api: allow filtering by date

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-14 at 14:14 -0400, Aaron Conole wrote: > This commit allows users of the REST API to query for events based on > the date field.  This will allow utility writers to select a smaller > subset of events when polling. > > Signed-off-by: Aaron Conole Reviewed-by:

Re: [PATCH 09/10] Add fuzzer-generated tests

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > This is the set of tests I generated while fuzzing the parser. > > Add them so we don't regress. > > Signed-off-by: Daniel Axtens The only nit I have here is about putting the mboxes in another folder. It's a nit

Re: [PATCH 10/10] [RFC] Fuzzing harness

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: >  - install python-afl in Docker (py2 doesn't seem to work) > >  - change parser to return BrokenEmailException. This allows >    us to catch other sorts of ValueError. > >  - fuzz management command to be used in py-afl-fuzz > >

Re: [PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:35 +1000, Andrew Donnellan wrote: > On 28/06/17 16:55, Daniel Axtens wrote: > > This is just a matter of correctly closing files we open. > > > > Signed-off-by: Daniel Axtens > > Reviewed-by: Andrew Donnellan Aye, looks

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > The charset guessing algorithm doesn't work if it has to guess > multiple charsets, as it overwrites the payload with None. > > Signed-off-by: Daniel Axtens I don't imagine this would ever happen in real world, but then

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > It turns out that there is a lot that can go wrong in parsing a > date. OverflowError, ValueError and OSError have all been observed. > > If these go wrong, substitute the current datetime. > > Signed-off-by: Daniel Axtens

Re: [PATCH 00/10] Fuzzing the parser: fixes

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > Hi all, > > I fuzzed parsemail with python-afl/afl-fuzz. > > Predictably, it did not hold up well. Nothing major, just a bunch of > edge cases. Date parsing held up particularly poorly. > > Currently, if we hit any of the errors this

Re: [PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote: > In python3, mail.get() can return either a string, or an > email.header.Header type. > > clean_header() is designed to clean headers into strings, > so make sure we use that everywhere. > > Signed-off-by: Daniel Axtens

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Stephen Finucane
On Thu, 2017-06-29 at 00:06 +1000, Daniel Axtens wrote: > Andrew Donnellan writes: > > > On 28/06/17 17:48, Daniel Axtens wrote: > > > Certain really messed up email messages can cause a failure within > > > the email module (at least on py3). Catch this. > > > > >

Re: Patchwork v2 and beyond

2017-06-28 Thread Stephen Finucane
On Wed, 2017-06-28 at 12:17 +1000, Daniel Axtens wrote: > Hi Stephen, > > > Yes and no. We're on RC4 already, which is kind of mad. If you look at the > > RCs we've been releasing, most of them have been more about adding features > > to fill holes or oversights than fixing bugs. I'm concerned

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Daniel Axtens
Andrew Donnellan writes: > On 28/06/17 17:48, Daniel Axtens wrote: >> It turns out that there is a lot that can go wrong in parsing a >> date. OverflowError, ValueError and OSError have all been observed. >> >> If these go wrong, substitute the current datetime. >>

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Daniel Axtens
It turns out we need to extend this to email and name fields as well; that just popped up in the fuzzer. I'll send a followup. Regards, Daniel Andrew Donnellan writes: > On 28/06/17 17:48, Daniel Axtens wrote: >> The db limit is 255: we need to make sure we don't

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Daniel Axtens
Andrew Donnellan writes: > On 28/06/17 17:48, Daniel Axtens wrote: >> Certain really messed up email messages can cause a failure within >> the email module (at least on py3). Catch this. >> >> Signed-off-by: Daniel Axtens >> --- >>

[PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Daniel Axtens
This is just a matter of correctly closing files we open. Signed-off-by: Daniel Axtens --- patchwork/management/commands/parsearchive.py | 1 + patchwork/tests/test_management.py| 2 ++ patchwork/tests/test_series.py| 9 +

[PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Daniel Axtens
The db limit is 255: we need to make sure we don't pass anything longer in or it will throw an exception. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/patchwork/parser.py b/patchwork/parser.py

[PATCH 09/10] Add fuzzer-generated tests

2017-06-28 Thread Daniel Axtens
This is the set of tests I generated while fuzzing the parser. Add them so we don't regress. Signed-off-by: Daniel Axtens --- patchwork/tests/__init__.py | 1 + patchwork/tests/fuzztests/base64err.mbox | 46

[PATCH 05/10] parser: deal with headers entirely failing to parse

2017-06-28 Thread Daniel Axtens
It turns out that the attempts in clean_header() to convert headers to strings are not guaranteed to work: you can end up with, for example, a base64 decoding error which makes it impossible to determine any header content. In this case, sanitise_header() should return None, and thus

[PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Daniel Axtens
Certain really messed up email messages can cause a failure within the email module (at least on py3). Catch this. Signed-off-by: Daniel Axtens --- patchwork/management/commands/parsearchive.py | 9 patchwork/management/commands/parsemail.py| 31

[PATCH 06/10] parser: better date parsing

2017-06-28 Thread Daniel Axtens
It turns out that there is a lot that can go wrong in parsing a date. OverflowError, ValueError and OSError have all been observed. If these go wrong, substitute the current datetime. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 25 - 1 file

Re: [PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: On Python3 this presents as a ValueError On Python2 this presents as a TypeError In both cases, catch these exceptions. Signed-off-by: Daniel Axtens This is the kind of error which one would only ever find by fuzzing... Error classes

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: It turns out that there is a lot that can go wrong in parsing a date. OverflowError, ValueError and OSError have all been observed. If these go wrong, substitute the current datetime. Signed-off-by: Daniel Axtens You could merge all

Re: [PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: In python3, mail.get() can return either a string, or an email.header.Header type. clean_header() is designed to clean headers into strings, so make sure we use that everywhere. Signed-off-by: Daniel Axtens All of this looks good.

Re: [PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Andrew Donnellan
On 28/06/17 16:55, Daniel Axtens wrote: This is just a matter of correctly closing files we open. Signed-off-by: Daniel Axtens Reviewed-by: Andrew Donnellan --- patchwork/management/commands/parsearchive.py | 1 +

[PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Daniel Axtens
The charset guessing algorithm doesn't work if it has to guess multiple charsets, as it overwrites the payload with None. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/patchwork/parser.py

[PATCH 00/10] Fuzzing the parser: fixes

2017-06-28 Thread Daniel Axtens
Hi all, I fuzzed parsemail with python-afl/afl-fuzz. Predictably, it did not hold up well. Nothing major, just a bunch of edge cases. Date parsing held up particularly poorly. Currently, if we hit any of the errors this patch set fixes, the entire mail will be rejected. With this patch set, the

[PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

2017-06-28 Thread Daniel Axtens
On Python3 this presents as a ValueError On Python2 this presents as a TypeError In both cases, catch these exceptions. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patchwork/parser.py

[PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Daniel Axtens
In python3, mail.get() can return either a string, or an email.header.Header type. clean_header() is designed to clean headers into strings, so make sure we use that everywhere. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 18 +- 1 file changed, 9

[PATCH 04/10] parser: catch failures in decoding headers

2017-06-28 Thread Daniel Axtens
Headers can fail to decode: - if a part cannot be encoded as ascii - if the coding hint names a codec that doesn't exist - if there's a null byte in the codec name Catch these errors. Signed-off-by: Daniel Axtens --- patchwork/parser.py | 8 +--- 1 file changed, 5

Re: [PATCH 04/10] parser: catch failures in decoding headers

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: Headers can fail to decode: - if a part cannot be encoded as ascii - if the coding hint names a codec that doesn't exist - if there's a null byte in the codec name Catch these errors. Signed-off-by: Daniel Axtens Reviewed-by:

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: The db limit is 255: we need to make sure we don't pass anything longer in or it will throw an exception. Signed-off-by: Daniel Axtens Reviewed-by: Andrew Donnellan --- patchwork/parser.py | 6 --

[PATCH 10/10] [RFC] Fuzzing harness

2017-06-28 Thread Daniel Axtens
- install python-afl in Docker (py2 doesn't seem to work) - change parser to return BrokenEmailException. This allows us to catch other sorts of ValueError. - fuzz management command to be used in py-afl-fuzz Signed-off-by: Daniel Axtens ---

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Andrew Donnellan
Reviewed-by: Andrew Donnellan On 28/06/17 17:48, Daniel Axtens wrote: The charset guessing algorithm doesn't work if it has to guess multiple charsets, as it overwrites the payload with None. Signed-off-by: Daniel Axtens --- patchwork/parser.py

Re: [PATCH 05/10] parser: deal with headers entirely failing to parse

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: It turns out that the attempts in clean_header() to convert headers to strings are not guaranteed to work: you can end up with, for example, a base64 decoding error which makes it impossible to determine any header content. In this case,

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Andrew Donnellan
On 28/06/17 17:48, Daniel Axtens wrote: Certain really messed up email messages can cause a failure within the email module (at least on py3). Catch this. Signed-off-by: Daniel Axtens --- patchwork/management/commands/parsearchive.py | 9