date:20170628

Re: [PATCH 00/10] Fuzzing the parser: fixes

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> Hi all,
> 
> I fuzzed parsemail with python-afl/afl-fuzz.
> 
> Predictably, it did not hold up well. Nothing major, just a bunch of
> edge cases. Date parsing held up particularly poorly.
> 
> Currently, if we hit any of the errors this patch set fixes, the
> entire mail will be rejected. With this patch set, the emails have a
> much better chance of surviving. This is helpful for us - we have
> previously had bug reports where a stray mail has been rejected due to
> a corrupt header, which is suboptimal.
> 
> Test cases are included. They may not email particularly well so I
> have also put them up on my github as a signed tag:
> https://github.com/daxtens/patchwork/releases/tag/fuzz-testing
> 
> For the interested, the bulk of the fuzzing was done on an AWS cloud
> instance, using something quite similar to the standard docker-compose
> setup. Several hundered thousand executions were done - nowhere near
> what I'd like but I can't seem to get execution speed above about 1
> execution per second. If I was able to confidently mock out the db or
> extract the parser out of the django framework it would go faster, but
> both of those were a bit more than I was easily able to do. I also
> picked up some bugs when interfacing with the db (patch 7) so I feel
> like this approach was vindicated.
> 
> I also included a patch to allow people to replicate my setup. It's
> not really ready to merge and I'm not convinced it's really necessary
> as it's not a lot of work to replicate.

I've merged all patches except 08/10 and 10/10. The former needs a bit of
rework, and the latter can wait til 2.1 at least.

I can tag either rc5 or final once 08/10 is resent.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 10/10] [RFC] Fuzzing harness

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
>  - install python-afl in Docker (py2 doesn't seem to work)
> 
>  - change parser to return BrokenEmailException. This allows
>    us to catch other sorts of ValueError.
> 
>  - fuzz management command to be used in py-afl-fuzz
> 
> Signed-off-by: Daniel Axtens 

FYI I haven't reviewed this, given that it's an RFC and non-critical, per your
comments. Will come back and revisit post 2.0.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 09/10] Add fuzzer-generated tests

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> This is the set of tests I generated while fuzzing the parser.
> 
> Add them so we don't regress.
> 
> Signed-off-by: Daniel Axtens 

The only nit I have here is about putting the mboxes in another folder. It's a
nit though, so meh. Other than that, it's all good.

Reviewed-by: Stephen Finucane 

...and applied.
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> The db limit is 255: we need to make sure we don't pass anything
> longer in or it will throw an exception.
> 
> Signed-off-by: Daniel Axtens 

We can fix the other fields in a separate patch. This one looks good to me.

Reviewed-by: Stephen Finucane 
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> It turns out that there is a lot that can go wrong in parsing a
> date. OverflowError, ValueError and OSError have all been observed.
> 
> If these go wrong, substitute the current datetime.
> 
> Signed-off-by: Daniel Axtens 

Reviewed-by: Stephen Finucane 

...and applied with one change.

> ---
>  patchwork/parser.py | 25 -
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 203e11584504..80450c2e4860 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -344,10 +344,33 @@ def find_date(mail):
>  h = clean_header(mail.get('Date', ''))
>  if not h:
>  return datetime.datetime.utcnow()
> +
>  t = parsedate_tz(h)
>  if not t:
>  return datetime.datetime.utcnow()
> -return datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +
> +try:
> +d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
> +except OverflowError:
> +# If you have a date like:
> +# Date: Wed, 4 Jun 2014 17:50:46 0
> +# then you can end up with:
> +# OverflowError: Python int too large to convert to C long
> +d = datetime.datetime.utcnow()
> +except ValueError:
> +# If you have a date like:
> +# Date:, 11 Sep 2016 23:22:904070804030804 +0100
> +# then you can end up with:
> +# ValueError: year is out of range
> +d = datetime.datetime.utcnow()
> +except OSError:
> +# If you have a date like:
> +# Date:, 11 Sep 2016 407080403080105:04 +0100
> +# then you can end up with (in py3)
> +# OSError: [Errno 75] Value too large for defined data type
> +d = datetime.datetime.utcnow()

Merged all of these into one, as suggested by Andrew.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> In python3, mail.get() can return either a string, or an
> email.header.Header type.
> 
> clean_header() is designed to clean headers into strings,
> so make sure we use that everywhere.
> 
> Signed-off-by: Daniel Axtens 

Reviewed-by: Stephen Finucane 

and applied.

Stephen

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:48 +1000, Daniel Axtens wrote:
> The charset guessing algorithm doesn't work if it has to guess
> multiple charsets, as it overwrites the payload with None.
> 
> Signed-off-by: Daniel Axtens 

I don't imagine this would ever happen in real world, but then again nothing
would surprise me at this point. Looks good to me. Thus:

Reviewed-by: Stephen Finucane 

and applied.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Stephen Finucane

On Thu, 2017-06-29 at 00:06 +1000, Daniel Axtens wrote:
> Andrew Donnellan  writes:
> 
> > On 28/06/17 17:48, Daniel Axtens wrote:
> > > Certain really messed up email messages can cause a failure within
> > > the email module (at least on py3). Catch this.
> > > 
> > > Signed-off-by: Daniel Axtens 
> > > ---
> > >  patchwork/management/commands/parsearchive.py |  9 
> > >  patchwork/management/commands/parsemail.py| 31 -
> > > --
> > >  2 files changed, 27 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/patchwork/management/commands/parsearchive.py
> > > b/patchwork/management/commands/parsearchive.py
> > > index a3c8360186c8..3aab58a45bcd 100644
> > > --- a/patchwork/management/commands/parsearchive.py
> > > +++ b/patchwork/management/commands/parsearchive.py
> > > @@ -77,6 +77,15 @@ class Command(BaseCommand):
> > > 
> > >  count = len(mbox)
> > > 
> > > +# detect broken mails in the mbox
> > > +# see earlyfail fuzz test on py3
> > > +try:
> > > +for m in mbox:
> > > +pass
> > > +except AttributeError:
> > > +logger.warning('Broken mbox/Maildir, aborting')
> > > +return
> > > +
> > 
> > This seems a bit non-obvious and could do with a little bit of explanation?
> 
> The message, or the code structure? I structured the code this way
> rather than the more obvious
>  try:
>    mbox = [m for m in mbox]
>  ...
> because the more obvious way requires loading the entire mbox/maildir
> into memory and I was a bit worried about the memory consumption of that
> when parsing a large mbox.
> 
> I agree a more helpful comment would have been in order. Stephen, do you
> want a v2 of this patch by itself? I can resend the series but it seems
> a bit excessive... Or I could do a follow-up.

Yeah, let's just do a follow-up. I'm going to go ahead and merge the rest of
these.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 17:35 +1000, Andrew Donnellan wrote:
> On 28/06/17 16:55, Daniel Axtens wrote:
> > This is just a matter of correctly closing files we open.
> > 
> > Signed-off-by: Daniel Axtens 
> 
> Reviewed-by: Andrew Donnellan 

Aye, looks good to me too.

Reviewed-by: Stephen Finucane 

...and applied.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH RFC] events-api: allow filtering by date

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-14 at 14:14 -0400, Aaron Conole wrote:
> This commit allows users of the REST API to query for events based on
> the date field.  This will allow utility writers to select a smaller
> subset of events when polling.
> 
> Signed-off-by: Aaron Conole 

Reviewed-by: Stephen Finucane 

...and applied. Thanks, Aaron.

Stephen

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Daniel Axtens

Andrew Donnellan  writes:

> On 28/06/17 17:48, Daniel Axtens wrote:
>> Certain really messed up email messages can cause a failure within
>> the email module (at least on py3). Catch this.
>>
>> Signed-off-by: Daniel Axtens 
>> ---
>>  patchwork/management/commands/parsearchive.py |  9 
>>  patchwork/management/commands/parsemail.py| 31 
>> ---
>>  2 files changed, 27 insertions(+), 13 deletions(-)
>>
>> diff --git a/patchwork/management/commands/parsearchive.py 
>> b/patchwork/management/commands/parsearchive.py
>> index a3c8360186c8..3aab58a45bcd 100644
>> --- a/patchwork/management/commands/parsearchive.py
>> +++ b/patchwork/management/commands/parsearchive.py
>> @@ -77,6 +77,15 @@ class Command(BaseCommand):
>>
>>  count = len(mbox)
>>
>> +# detect broken mails in the mbox
>> +# see earlyfail fuzz test on py3
>> +try:
>> +for m in mbox:
>> +pass
>> +except AttributeError:
>> +logger.warning('Broken mbox/Maildir, aborting')
>> +return
>> +
>
> This seems a bit non-obvious and could do with a little bit of explanation?

The message, or the code structure? I structured the code this way
rather than the more obvious
 try:
   mbox = [m for m in mbox]
 ...
because the more obvious way requires loading the entire mbox/maildir
into memory and I was a bit worried about the memory consumption of that
when parsing a large mbox.

I agree a more helpful comment would have been in order. Stephen, do you
want a v2 of this patch by itself? I can resend the series but it seems
a bit excessive... Or I could do a follow-up.

Regards,
Daniel

>
>>  logger.info('Parsing %d mails', count)
>>  for i, msg in enumerate(mbox):
>>  try:
>> diff --git a/patchwork/management/commands/parsemail.py 
>> b/patchwork/management/commands/parsemail.py
>> index 9adfb25b09e3..52ec8bc56899 100644
>> --- a/patchwork/management/commands/parsemail.py
>> +++ b/patchwork/management/commands/parsemail.py
>> @@ -58,20 +58,25 @@ class Command(base.BaseCommand):
>>  def handle(self, *args, **options):
>>  infile = args[0] if args else options['infile']
>>
>> -if infile:
>> -logger.info('Parsing mail loaded by filename')
>> -if six.PY3:
>> -with open(infile, 'rb') as file_:
>> -mail = email.message_from_binary_file(file_)
>> -else:
>> -with open(infile) as file_:
>> -mail = email.message_from_file(file_)
>> -else:
>> -logger.info('Parsing mail loaded from stdin')
>> -if six.PY3:
>> -mail = email.message_from_binary_file(sys.stdin.buffer)
>> +try:
>> +if infile:
>> +logger.info('Parsing mail loaded by filename')
>> +if six.PY3:
>> +with open(infile, 'rb') as file_:
>> +mail = email.message_from_binary_file(file_)
>> +else:
>> +with open(infile) as file_:
>> +mail = email.message_from_file(file_)
>>  else:
>> -mail = email.message_from_file(sys.stdin)
>> +logger.info('Parsing mail loaded from stdin')
>> +if six.PY3:
>> +mail = email.message_from_binary_file(sys.stdin.buffer)
>> +else:
>> +mail = email.message_from_file(sys.stdin)
>> +except AttributeError:
>> +logger.warning("Broken email ignored")
>> +return
>> +
>>  try:
>>  result = parse_mail(mail, options['list_id'])
>>  if result:
>>
>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> andrew.donnel...@au1.ibm.com  IBM Australia Limited
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Daniel Axtens

It turns out we need to extend this to email and name fields as well;
that just popped up in the fuzzer. I'll send a followup.

Regards,
Daniel

Andrew Donnellan  writes:

> On 28/06/17 17:48, Daniel Axtens wrote:
>> The db limit is 255: we need to make sure we don't pass anything
>> longer in or it will throw an exception.
>>
>> Signed-off-by: Daniel Axtens 
>
> Reviewed-by: Andrew Donnellan 
>
>> ---
>>  patchwork/parser.py | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 80450c2e4860..46e6ca161574 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -225,7 +225,7 @@ def _find_series_by_references(project, mail):
>>  for ref in refs:
>>  try:
>>  return SeriesReference.objects.get(
>> -msgid=ref, series__project=project).series
>> +msgid=ref[:255], series__project=project).series
>>  except SeriesReference.DoesNotExist:
>>  continue
>>
>> @@ -557,6 +557,7 @@ def find_comment_content(mail):
>>
>>  def find_submission_for_comment(project, refs):
>>  for ref in refs:
>> +ref = ref[:255]
>>  # first, check for a direct reply
>>  try:
>>  submission = Submission.objects.get(project=project, msgid=ref)
>> @@ -934,7 +935,7 @@ def parse_mail(mail, list_id=None):
>>  msgid = clean_header(mail.get('Message-Id'))
>>  if not msgid:
>>  raise ValueError("Broken 'Message-Id' header")
>> -msgid = msgid.strip()
>> +msgid = msgid.strip()[:255]
>>
>>  author = find_author(mail)
>>  subject = mail.get('Subject')
>> @@ -993,6 +994,7 @@ def parse_mail(mail, list_id=None):
>>  # be possible to identify the relationship between patches
>>  # as the earlier patch does not reference the later one.
>>  for ref in refs + [msgid]:
>> +ref = ref[:255]
>>  # we don't want duplicates
>>  try:
>>  # we could have a ref to a previous series. (For
>>
>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> andrew.donnel...@au1.ibm.com  IBM Australia Limited
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Daniel Axtens

Andrew Donnellan  writes:

> On 28/06/17 17:48, Daniel Axtens wrote:
>> It turns out that there is a lot that can go wrong in parsing a
>> date. OverflowError, ValueError and OSError have all been observed.
>>
>> If these go wrong, substitute the current datetime.
>>
>> Signed-off-by: Daniel Axtens 
>
> You could merge all those into one except block.

Yeah true. Stephen, if you want to do that at merge time that'd be
fine with me.

>
> Regardless,
>
> Reviewed-by: Andrew Donnellan 
>
>> ---
>>  patchwork/parser.py | 25 -
>>  1 file changed, 24 insertions(+), 1 deletion(-)
>>
>> diff --git a/patchwork/parser.py b/patchwork/parser.py
>> index 203e11584504..80450c2e4860 100644
>> --- a/patchwork/parser.py
>> +++ b/patchwork/parser.py
>> @@ -344,10 +344,33 @@ def find_date(mail):
>>  h = clean_header(mail.get('Date', ''))
>>  if not h:
>>  return datetime.datetime.utcnow()
>> +
>>  t = parsedate_tz(h)
>>  if not t:
>>  return datetime.datetime.utcnow()
>> -return datetime.datetime.utcfromtimestamp(mktime_tz(t))
>> +
>> +try:
>> +d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
>> +except OverflowError:
>> +# If you have a date like:
>> +# Date: Wed, 4 Jun 2014 17:50:46 0
>> +# then you can end up with:
>> +# OverflowError: Python int too large to convert to C long
>> +d = datetime.datetime.utcnow()
>> +except ValueError:
>> +# If you have a date like:
>> +# Date:, 11 Sep 2016 23:22:904070804030804 +0100
>> +# then you can end up with:
>> +# ValueError: year is out of range
>> +d = datetime.datetime.utcnow()
>> +except OSError:
>> +# If you have a date like:
>> +# Date:, 11 Sep 2016 407080403080105:04 +0100
>> +# then you can end up with (in py3)
>> +# OSError: [Errno 75] Value too large for defined data type
>> +d = datetime.datetime.utcnow()
>> +
>> +return d
>>
>>
>>  def find_headers(mail):
>>
>
> -- 
> Andrew Donnellan  OzLabs, ADL Canberra
> andrew.donnel...@au1.ibm.com  IBM Australia Limited
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: Patchwork v2 and beyond

2017-06-28 Thread Stephen Finucane

On Wed, 2017-06-28 at 12:17 +1000, Daniel Axtens wrote:
> Hi Stephen,
> 
> > Yes and no. We're on RC4 already, which is kind of mad. If you look at the
> > RCs we've been releasing, most of them have been more about adding features
> > to fill holes or oversights than fixing bugs. I'm concerned that if we
> > continue on this trajectory, new features will keep coming in and we'll
> > never get the release out. To be honest, at this point I want to just merge
> > the "single patch as series feature" and, barring any serious issues from
> > the below, release a final version. We're not Debian, but we've done a lot
> > of real world testing with popular mailing lists and found very few issues.
> > If we do encounter some later, I'm more than happy to make PATCH releases
> > as required. I'll hang on for the fuzzing results, but after that I really
> > want to cut this and move on to getting it deployed and used in the real
> > world.
> 
> Yeah, I understand that and I think we'll probably get away with it this
> time without any major issues. In hindsight, I think we probably needed
> to call a feature-freeze explicitly. I kind of assumed the rc process
> would do that because that is how it works in kernel land, but that
> hasn't been what ended up happening. In hindsight, I probably should
> have kicked off this discussion before the start of the rc process.

Hmm, I'm not sure think a feature freeze was necessarily the solution, as most
of the changes that have gone affected the API responses/behaviour and could
not be merged in a separate release (at least, not without immediately bumping
the MAJOR version of the API in the next release and making folks write clients
to handle both). However, the significance of the changes between rc1 and rc4
(and soon, rc5) suggest that "release candidate" was the wrong messaging. Maybe
alpha and beta tags would be a more appropriate solution next time we're making
a MAJOR release like this?

> > ** REST support for pwclient
> > 
> >    See notes above. It's worth mentioning that I've already started work on
> > separating out pwclient into its own Git repo, but I plan to continue to
> > use
> > the same mailing list for patch, bugs, etc. This leads us to...
> 
> It would be nice to make it pip install capable and/or putting it in a
> PPA on launchpad so people can apt-get install it. Happy to help with
> that.

Yup, that's the plan, and my wish to use Git metadata for versioning is a
primary driver of the split. I've already got the name registered on PyPi so we
can do this in the near future.

Stephen
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

Certain really messed up email messages can cause a failure within
the email module (at least on py3). Catch this.

Signed-off-by: Daniel Axtens 
---
 patchwork/management/commands/parsearchive.py |  9 
 patchwork/management/commands/parsemail.py| 31 ---
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/patchwork/management/commands/parsearchive.py 
b/patchwork/management/commands/parsearchive.py
index a3c8360186c8..3aab58a45bcd 100644
--- a/patchwork/management/commands/parsearchive.py
+++ b/patchwork/management/commands/parsearchive.py
@@ -77,6 +77,15 @@ class Command(BaseCommand):

 count = len(mbox)

+# detect broken mails in the mbox
+# see earlyfail fuzz test on py3
+try:
+for m in mbox:
+pass
+except AttributeError:
+logger.warning('Broken mbox/Maildir, aborting')
+return
+


This seems a bit non-obvious and could do with a little bit of explanation?


 logger.info('Parsing %d mails', count)
 for i, msg in enumerate(mbox):
 try:
diff --git a/patchwork/management/commands/parsemail.py 
b/patchwork/management/commands/parsemail.py
index 9adfb25b09e3..52ec8bc56899 100644
--- a/patchwork/management/commands/parsemail.py
+++ b/patchwork/management/commands/parsemail.py
@@ -58,20 +58,25 @@ class Command(base.BaseCommand):
 def handle(self, *args, **options):
 infile = args[0] if args else options['infile']

-if infile:
-logger.info('Parsing mail loaded by filename')
-if six.PY3:
-with open(infile, 'rb') as file_:
-mail = email.message_from_binary_file(file_)
-else:
-with open(infile) as file_:
-mail = email.message_from_file(file_)
-else:
-logger.info('Parsing mail loaded from stdin')
-if six.PY3:
-mail = email.message_from_binary_file(sys.stdin.buffer)
+try:
+if infile:
+logger.info('Parsing mail loaded by filename')
+if six.PY3:
+with open(infile, 'rb') as file_:
+mail = email.message_from_binary_file(file_)
+else:
+with open(infile) as file_:
+mail = email.message_from_file(file_)
 else:
-mail = email.message_from_file(sys.stdin)
+logger.info('Parsing mail loaded from stdin')
+if six.PY3:
+mail = email.message_from_binary_file(sys.stdin.buffer)
+else:
+mail = email.message_from_file(sys.stdin)
+except AttributeError:
+logger.warning("Broken email ignored")
+return
+
 try:
 result = parse_mail(mail, options['list_id'])
 if result:



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

The db limit is 255: we need to make sure we don't pass anything
longer in or it will throw an exception.

Signed-off-by: Daniel Axtens 


Reviewed-by: Andrew Donnellan 


---
 patchwork/parser.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 80450c2e4860..46e6ca161574 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -225,7 +225,7 @@ def _find_series_by_references(project, mail):
 for ref in refs:
 try:
 return SeriesReference.objects.get(
-msgid=ref, series__project=project).series
+msgid=ref[:255], series__project=project).series
 except SeriesReference.DoesNotExist:
 continue

@@ -557,6 +557,7 @@ def find_comment_content(mail):

 def find_submission_for_comment(project, refs):
 for ref in refs:
+ref = ref[:255]
 # first, check for a direct reply
 try:
 submission = Submission.objects.get(project=project, msgid=ref)
@@ -934,7 +935,7 @@ def parse_mail(mail, list_id=None):
 msgid = clean_header(mail.get('Message-Id'))
 if not msgid:
 raise ValueError("Broken 'Message-Id' header")
-msgid = msgid.strip()
+msgid = msgid.strip()[:255]

 author = find_author(mail)
 subject = mail.get('Subject')
@@ -993,6 +994,7 @@ def parse_mail(mail, list_id=None):
 # be possible to identify the relationship between patches
 # as the earlier patch does not reference the later one.
 for ref in refs + [msgid]:
+ref = ref[:255]
 # we don't want duplicates
 try:
 # we could have a ref to a previous series. (For



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 06/10] parser: better date parsing

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

It turns out that there is a lot that can go wrong in parsing a
date. OverflowError, ValueError and OSError have all been observed.

If these go wrong, substitute the current datetime.

Signed-off-by: Daniel Axtens 


You could merge all those into one except block.

Regardless,

Reviewed-by: Andrew Donnellan 


---
 patchwork/parser.py | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 203e11584504..80450c2e4860 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -344,10 +344,33 @@ def find_date(mail):
 h = clean_header(mail.get('Date', ''))
 if not h:
 return datetime.datetime.utcnow()
+
 t = parsedate_tz(h)
 if not t:
 return datetime.datetime.utcnow()
-return datetime.datetime.utcfromtimestamp(mktime_tz(t))
+
+try:
+d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
+except OverflowError:
+# If you have a date like:
+# Date: Wed, 4 Jun 2014 17:50:46 0
+# then you can end up with:
+# OverflowError: Python int too large to convert to C long
+d = datetime.datetime.utcnow()
+except ValueError:
+# If you have a date like:
+# Date:, 11 Sep 2016 23:22:904070804030804 +0100
+# then you can end up with:
+# ValueError: year is out of range
+d = datetime.datetime.utcnow()
+except OSError:
+# If you have a date like:
+# Date:, 11 Sep 2016 407080403080105:04 +0100
+# then you can end up with (in py3)
+# OSError: [Errno 75] Value too large for defined data type
+d = datetime.datetime.utcnow()
+
+return d


 def find_headers(mail):



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 05/10] parser: deal with headers entirely failing to parse

2017-06-28 Thread Andrew Donnellan




On 28/06/17 17:48, Daniel Axtens wrote:

It turns out that the attempts in clean_header() to convert
headers to strings are not guaranteed to work: you can end up with,
for example, a base64 decoding error which makes it impossible
to determine any header content.

In this case, sanitise_header() should return None, and thus
clean_header() should return None. We then need to plumb that
through.

Signed-off-by: Daniel Axtens 


All looks good

Reviewed-by: Andrew Donnellan 


---
 patchwork/parser.py | 70 +
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 032af8a7be7c..203e11584504 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -23,6 +23,7 @@ from email.header import decode_header
 from email.header import make_header
 from email.utils import mktime_tz
 from email.utils import parsedate_tz
+from email.errors import HeaderParseError
 from fnmatch import fnmatch
 import logging
 import re
@@ -67,6 +68,13 @@ def sanitise_header(header_contents, header_name=None):
 then encode the result with make_header()
 """

+try:
+value = decode_header(header_contents)
+except HeaderParseError:
+# something has gone really wrong with header parsing.
+# (e.g. base64 decoding) We probably can't recover, so:
+return None
+
 # We have some Py2/Py3 issues here.
 #
 # Firstly, the email parser (before we get here)
@@ -85,7 +93,6 @@ def sanitise_header(header_contents, header_name=None):
 # We solve this by catching any Unicode errors, and then manually
 # handling any interesting headers.

-value = decode_header(header_contents)
 try:
 header = make_header(value,
  header_name=header_name,
@@ -130,6 +137,9 @@ def clean_header(header):

 sane_header = sanitise_header(header)

+if sane_header is None:
+return None
+
 # on Py2, we want to do unicode(), on Py3, str().
 # That gets us the decoded, un-wrapped header.
 if six.PY2:
@@ -157,9 +167,12 @@ def find_project_by_header(mail):

 for header in list_id_headers:
 if header in mail:
+h = clean_header(mail.get(header))
+if not h:
+continue

 for listid_re in listid_res:
-match = listid_re.match(clean_header(mail.get(header)))
+match = listid_re.match(h)
 if match:
 break

@@ -205,7 +218,11 @@ def _find_series_by_references(project, mail):
 Returns:
 The matching ``Series`` instance, if any
 """
-for ref in [clean_header(mail.get('Message-Id'))] + find_references(mail):
+refs = find_references(mail)
+h = clean_header(mail.get('Message-Id'))
+if h:
+refs = [h] + refs
+for ref in refs:
 try:
 return SeriesReference.objects.get(
 msgid=ref, series__project=project).series
@@ -274,6 +291,10 @@ def find_series(project, mail):

 def find_author(mail):
 from_header = clean_header(mail.get('From'))
+
+if not from_header:
+raise ValueError("Invalid 'From' header")
+
 name, email = (None, None)

 # tuple of (regex, fn)
@@ -320,7 +341,10 @@ def find_author(mail):


 def find_date(mail):
-t = parsedate_tz(clean_header(mail.get('Date', '')))
+h = clean_header(mail.get('Date', ''))
+if not h:
+return datetime.datetime.utcnow()
+t = parsedate_tz(h)
 if not t:
 return datetime.datetime.utcnow()
 return datetime.datetime.utcfromtimestamp(mktime_tz(t))
@@ -331,7 +355,7 @@ def find_headers(mail):
for key, value in mail.items()]

 strings = [('%s: %s' % (key, header.encode()))
-   for (key, header) in headers]
+   for (key, header) in headers if header is not None]

 return '\n'.join(strings)

@@ -347,11 +371,16 @@ def find_references(mail):

 if 'In-Reply-To' in mail:
 for in_reply_to in mail.get_all('In-Reply-To'):
-refs.append(clean_header(in_reply_to).strip())
+r = clean_header(in_reply_to)
+if r:
+refs.append(r.strip())

 if 'References' in mail:
 for references_header in mail.get_all('References'):
-references = clean_header(references_header).split()
+h = clean_header(references_header)
+if not h:
+continue
+references = h.split()
 references.reverse()
 for ref in references:
 ref = ref.strip()
@@ -573,6 +602,9 @@ def clean_subject(subject, drop_prefixes=None):
 prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
 subject = clean_header(subject)

+if not subject:
+raise ValueError("Invalid 'Subject' header")
+
 if drop_prefixes is None:
 drop_prefixes = []

Re: [PATCH 04/10] parser: catch failures in decoding headers

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

Headers can fail to decode:
 - if a part cannot be encoded as ascii
 - if the coding hint names a codec that doesn't exist
 - if there's a null byte in the codec name

Catch these errors.

Signed-off-by: Daniel Axtens 


Reviewed-by: Andrew Donnellan 


---
 patchwork/parser.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 19ed7ca4e315..032af8a7be7c 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -90,9 +90,11 @@ def sanitise_header(header_contents, header_name=None):
 header = make_header(value,
  header_name=header_name,
  continuation_ws='\t')
-except UnicodeDecodeError:
-# At least one of the parts cannot be encoded as ascii.
-# Find out which one and fix it somehow.
+except (UnicodeDecodeError, LookupError, ValueError, TypeError):
+#  - a part cannot be encoded as ascii. (UnicodeDecodeError), or
+#  - we don't have a codec matching the hint (LookupError)
+#  - the codec has a null byte (Py3 ValueError/Py2 TypeError)
+# Find out which part and fix it somehow.
 #
 # We get here under Py2 when there's non-7-bit chars in header,
 # or under Py2 or Py3 where decoding with the coding hint fails.



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

On Python3 this presents as a ValueError
On Python2 this presents as a TypeError

In both cases, catch these exceptions.

Signed-off-by: Daniel Axtens 


This is the kind of error which one would only ever find by fuzzing...

Error classes appear correct

Reviewed-by: Andrew Donnellan 


---
 patchwork/parser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 3ab4eb3d2011..19ed7ca4e315 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -436,7 +436,7 @@ def _find_content(mail):
 if charset is not None:
 try:
 codecs.lookup(charset)
-except LookupError:
+except (LookupError, ValueError, TypeError):
 charset = None

 # If there is no charset or if it is unknown, then try some common



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Andrew Donnellan


On 28/06/17 17:48, Daniel Axtens wrote:

In python3, mail.get() can return either a string, or an
email.header.Header type.

clean_header() is designed to clean headers into strings,
so make sure we use that everywhere.

Signed-off-by: Daniel Axtens 


All of this looks good.

Reviewed-by: Andrew Donnellan 




---
 patchwork/parser.py | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 4903aa8237e6..3ab4eb3d2011 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -157,7 +157,7 @@ def find_project_by_header(mail):
 if header in mail:

 for listid_re in listid_res:
-match = listid_re.match(mail.get(header))
+match = listid_re.match(clean_header(mail.get(header)))
 if match:
 break

@@ -203,7 +203,7 @@ def _find_series_by_references(project, mail):
 Returns:
 The matching ``Series`` instance, if any
 """
-for ref in [mail.get('Message-Id')] + find_references(mail):
+for ref in [clean_header(mail.get('Message-Id'))] + find_references(mail):
 try:
 return SeriesReference.objects.get(
 msgid=ref, series__project=project).series
@@ -318,7 +318,7 @@ def find_author(mail):


 def find_date(mail):
-t = parsedate_tz(mail.get('Date', ''))
+t = parsedate_tz(clean_header(mail.get('Date', '')))
 if not t:
 return datetime.datetime.utcnow()
 return datetime.datetime.utcfromtimestamp(mktime_tz(t))
@@ -345,11 +345,11 @@ def find_references(mail):

 if 'In-Reply-To' in mail:
 for in_reply_to in mail.get_all('In-Reply-To'):
-refs.append(in_reply_to.strip())
+refs.append(clean_header(in_reply_to).strip())

 if 'References' in mail:
 for references_header in mail.get_all('References'):
-references = references_header.split()
+references = clean_header(references_header).split()
 references.reverse()
 for ref in references:
 ref = ref.strip()
@@ -788,7 +788,7 @@ def parse_pull_request(content):

 def find_state(mail):
 """Return the state with the given name or the default."""
-state_name = mail.get('X-Patchwork-State', '').strip()
+state_name = clean_header(mail.get('X-Patchwork-State', '')).strip()
 if state_name:
 try:
 return State.objects.get(name__iexact=state_name)
@@ -825,7 +825,7 @@ def find_delegate_by_filename(project, filenames):

 def find_delegate_by_header(mail):
 """Return the delegate with the given email or None."""
-delegate_email = mail.get('X-Patchwork-Delegate', '').strip()
+delegate_email = clean_header(mail.get('X-Patchwork-Delegate', '')).strip()
 if delegate_email:
 try:
 return User.objects.get(email__iexact=delegate_email)
@@ -854,7 +854,7 @@ def parse_mail(mail, list_id=None):
 if 'Message-Id' not in mail:
 raise ValueError("Missing 'Message-Id' header")

-hint = mail.get('X-Patchwork-Hint', '').lower()
+hint = clean_header(mail.get('X-Patchwork-Hint', '')).lower()
 if hint == 'ignore':
 logger.debug("Ignoring email due to 'ignore' hint")
 return
@@ -870,7 +870,7 @@ def parse_mail(mail, list_id=None):

 # parse metadata

-msgid = mail.get('Message-Id').strip()
+msgid = clean_header(mail.get('Message-Id')).strip()
 author = find_author(mail)
 subject = mail.get('Subject')
 name, prefixes = clean_subject(subject, [project.linkname])



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Andrew Donnellan


Reviewed-by: Andrew Donnellan 

On 28/06/17 17:48, Daniel Axtens wrote:

The charset guessing algorithm doesn't work if it has to guess
multiple charsets, as it overwrites the payload with None.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 794a5eac2d29..4903aa8237e6 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -448,10 +448,11 @@ def _find_content(mail):

 for cset in try_charsets:
 try:
-payload = six.text_type(payload, cset)
+new_payload = six.text_type(payload, cset)
 break
 except UnicodeDecodeError:
-payload = None
+new_payload = None
+payload = new_payload

 # Could not find a valid decoded payload.  Fail.
 if payload is None:



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 10/10] [RFC] Fuzzing harness

2017-06-28 Thread Daniel Axtens

 - install python-afl in Docker (py2 doesn't seem to work)

 - change parser to return BrokenEmailException. This allows
   us to catch other sorts of ValueError.

 - fuzz management command to be used in py-afl-fuzz

Signed-off-by: Daniel Axtens 
---
 patchwork/management/commands/fuzz.py | 88 +++
 patchwork/parser.py   | 18 ---
 patchwork/tests/test_parser.py|  5 +-
 tools/docker/Dockerfile   |  2 +
 tools/fuzzer_dict | 52 +
 5 files changed, 156 insertions(+), 9 deletions(-)
 create mode 100644 patchwork/management/commands/fuzz.py
 create mode 100644 tools/fuzzer_dict

diff --git a/patchwork/management/commands/fuzz.py 
b/patchwork/management/commands/fuzz.py
new file mode 100644
index ..c2c08bcfbec2
--- /dev/null
+++ b/patchwork/management/commands/fuzz.py
@@ -0,0 +1,88 @@
+# Patchwork - automated patch tracking system
+# Copyright (C) 2016 Stephen Finucane 
+#
+# This file is part of the Patchwork package.
+#
+# Patchwork is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# Patchwork is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Patchwork; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+
+import email
+import logging
+
+from django.core.management import base
+from django.utils import six
+
+from patchwork.models import Person
+from patchwork.models import Patch
+from patchwork.models import Series
+from patchwork.models import CoverLetter
+from patchwork.models import Comment
+from patchwork.models import SeriesReference
+from patchwork.parser import parse_mail
+from patchwork.parser import BrokenEmailException
+
+import afl
+afl.init()
+
+logger = logging.getLogger(__name__)
+
+
+class Command(base.BaseCommand):
+help = 'Parse an mbox file and store any patch/comment found.'
+
+def add_arguments(self, parser):
+parser.add_argument(
+'infile',
+nargs=1,
+type=str,
+help='input mbox file')
+parser.add_argument(
+'--list-id',
+help='mailing list ID. If not supplied, this will be '
+'extracted from the mail headers.')
+
+def cleanup(self):
+Series.objects.all().delete()
+SeriesReference.objects.all().delete()
+Patch.objects.all().delete()
+Comment.objects.all().delete()
+CoverLetter.objects.all().delete()
+Person.objects.all().delete()
+
+def handle(self, *args, **options):
+infile = options['infile'][0]
+
+logger.info('Parsing mail loaded by filename')
+try:
+if six.PY3:
+with open(infile, 'rb') as file_:
+mail = email.message_from_binary_file(file_)
+else:
+with open(infile) as file_:
+mail = email.message_from_file(file_)
+except AttributeError:
+logger.warning("Broken email ignored")
+return
+
+try:
+parse_mail(mail, options['list_id'])
+self.cleanup()
+except BrokenEmailException:
+logger.warning("Broken email ignored")
+self.cleanup()
+except Exception as E:
+logger.exception('Error when parsing incoming email',
+ extra={'mail': mail.as_string()})
+self.cleanup()
+raise E
diff --git a/patchwork/parser.py b/patchwork/parser.py
index 46e6ca161574..eaeafa6f 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -54,6 +54,10 @@ SERIES_DELAY_INTERVAL = 10
 logger = logging.getLogger(__name__)
 
 
+class BrokenEmailException(Exception):
+pass
+
+
 def normalise_space(value):
 whitespace_re = re.compile(r'\s+')
 return whitespace_re.sub(' ', value).strip()
@@ -293,7 +297,7 @@ def find_author(mail):
 from_header = clean_header(mail.get('From'))
 
 if not from_header:
-raise ValueError("Invalid 'From' header")
+raise BrokenEmailException("Invalid 'From' header")
 
 name, email = (None, None)
 
@@ -324,7 +328,7 @@ def find_author(mail):
 break
 
 if not email:
-raise ValueError("Invalid 'From' header")
+raise BrokenEmailException("Invalid 'From' header")
 
 email = email.strip()
 if name is not None:
@@ -627,7 +631,7 @@ def clean_subject(subject, drop_prefixes=None):
 subject =

[PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

2017-06-28 Thread Daniel Axtens

The db limit is 255: we need to make sure we don't pass anything
longer in or it will throw an exception.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 80450c2e4860..46e6ca161574 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -225,7 +225,7 @@ def _find_series_by_references(project, mail):
 for ref in refs:
 try:
 return SeriesReference.objects.get(
-msgid=ref, series__project=project).series
+msgid=ref[:255], series__project=project).series
 except SeriesReference.DoesNotExist:
 continue
 
@@ -557,6 +557,7 @@ def find_comment_content(mail):
 
 def find_submission_for_comment(project, refs):
 for ref in refs:
+ref = ref[:255]
 # first, check for a direct reply
 try:
 submission = Submission.objects.get(project=project, msgid=ref)
@@ -934,7 +935,7 @@ def parse_mail(mail, list_id=None):
 msgid = clean_header(mail.get('Message-Id'))
 if not msgid:
 raise ValueError("Broken 'Message-Id' header")
-msgid = msgid.strip()
+msgid = msgid.strip()[:255]
 
 author = find_author(mail)
 subject = mail.get('Subject')
@@ -993,6 +994,7 @@ def parse_mail(mail, list_id=None):
 # be possible to identify the relationship between patches
 # as the earlier patch does not reference the later one.
 for ref in refs + [msgid]:
+ref = ref[:255]
 # we don't want duplicates
 try:
 # we could have a ref to a previous series. (For
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 09/10] Add fuzzer-generated tests

2017-06-28 Thread Daniel Axtens

This is the set of tests I generated while fuzzing the parser.

Add them so we don't regress.

Signed-off-by: Daniel Axtens 
---
 patchwork/tests/__init__.py  |   1 +
 patchwork/tests/fuzztests/base64err.mbox |  46 
 patchwork/tests/fuzztests/charset.mbox   | 131 +++
 patchwork/tests/fuzztests/codec-null.mbox| Bin 0 -> 8192 bytes
 patchwork/tests/fuzztests/date-oserror.mbox  | Bin 0 -> 8209 bytes
 patchwork/tests/fuzztests/date-too-long.mbox | Bin 0 -> 1828 bytes
 patchwork/tests/fuzztests/date.mbox  |  44 
 patchwork/tests/fuzztests/dateheader.mbox| Bin 0 -> 580 bytes
 patchwork/tests/fuzztests/earlyfail.mbox | Bin 0 -> 1712 bytes
 patchwork/tests/fuzztests/msgid-len.mbox | Bin 0 -> 1809 bytes
 patchwork/tests/fuzztests/msgid-len2.mbox|  37 +++
 patchwork/tests/fuzztests/msgidheader.mbox   | 131 +++
 patchwork/tests/fuzztests/refshdr.mbox   | Bin 0 -> 816 bytes
 patchwork/tests/fuzztests/unknown-encoding.mbox  | Bin 0 -> 1751 bytes
 patchwork/tests/fuzztests/value2.mbox| Bin 0 -> 1598 bytes
 patchwork/tests/fuzztests/year-out-of-range.mbox | Bin 0 -> 1660 bytes
 patchwork/tests/test_parser.py   |  57 +-
 17 files changed, 444 insertions(+), 3 deletions(-)
 create mode 100644 patchwork/tests/fuzztests/base64err.mbox
 create mode 100644 patchwork/tests/fuzztests/charset.mbox
 create mode 100644 patchwork/tests/fuzztests/codec-null.mbox
 create mode 100644 patchwork/tests/fuzztests/date-oserror.mbox
 create mode 100644 patchwork/tests/fuzztests/date-too-long.mbox
 create mode 100644 patchwork/tests/fuzztests/date.mbox
 create mode 100644 patchwork/tests/fuzztests/dateheader.mbox
 create mode 100644 patchwork/tests/fuzztests/earlyfail.mbox
 create mode 100644 patchwork/tests/fuzztests/msgid-len.mbox
 create mode 100644 patchwork/tests/fuzztests/msgid-len2.mbox
 create mode 100644 patchwork/tests/fuzztests/msgidheader.mbox
 create mode 100644 patchwork/tests/fuzztests/refshdr.mbox
 create mode 100644 patchwork/tests/fuzztests/unknown-encoding.mbox
 create mode 100644 patchwork/tests/fuzztests/value2.mbox
 create mode 100644 patchwork/tests/fuzztests/year-out-of-range.mbox

diff --git a/patchwork/tests/__init__.py b/patchwork/tests/__init__.py
index 8bdf1a6746db..cbe2b5d081dc 100644
--- a/patchwork/tests/__init__.py
+++ b/patchwork/tests/__init__.py
@@ -21,3 +21,4 @@ import os
 
 TEST_MAIL_DIR = os.path.join(os.path.dirname(__file__), 'mail')
 TEST_PATCH_DIR = os.path.join(os.path.dirname(__file__), 'patches')
+TEST_FUZZ_DIR = os.path.join(os.path.dirname(__file__), 'fuzztests')
diff --git a/patchwork/tests/fuzztests/base64err.mbox 
b/patchwork/tests/fuzztests/base64err.mbox
new file mode 100644
index ..9756d5c9c44e
--- /dev/null
+++ b/patchwork/tests/fuzztests/base64err.mbox
@@ -0,0 +1,46 @@
+From pat016
+Content-TCpe: text/plain; charset="utf-8"
+MIon: 1.0
+Content-Transfer-Encoding: 8bit
+Subject:: Up entry for B01X ARM
+From: =?utf-8?b?UmFmYcWCo alho Chehab 
+X-: 913
+Message-Id: <1m>
+To: Fan Flli m>
+Cc: bcm-k...@broadcom.com,
+ =?UTF-8?q?Rafa=C5=82=20Mi=C5:82ecki?= ,
+ An Morton ,
+ "David S. MilYer" ,
+ Greg Kroah-Hartman ,
+ Mauro alho Chehab ,
+ Guenter Roeck , Jiri Slaby ,
+ linux-ker...@vger.kernel.org ( list)
+Date:,  1 Jun 2016 22:00:54 +02
+Adh
+bci.
+
+>
+---
+ M++-
+(-)
+
+diff --ER@
+4
+--- a/ERS
 S
+@@ -2511,10 +2511,12 @@ F:*
+ 
+ BROURE
+ M:>
++Mfał
+ L:g
++Lcom
+ S:ed
+ 
+-F:
++F:i
+ F70*
+ 
+ B
diff --git a/patchwork/tests/fuzztests/charset.mbox 
b/patchwork/tests/fuzztests/charset.mbox
new file mode 100644
index ..a8a40c81f8d2
--- /dev/null
+++ b/patchwork/tests/fuzztests/charset.mbox
@@ -0,0 +1,131 @@
+Received: with ECARTIS (v1.0.0; list lin; Tue, 06 Dec 2011 01:49:42 +0100 (CET)
+ived: from mcom 337 O
+  om" -OK)
+by eddi with ESMTP id SAth PT
+ >); Tue, 6 Dec 2011 9:37 +0100
+Received:m caexch.com (Not Verified[092.168.16.9]) by mail3.caviumnetworks.com 
with Mal (
+id 00>; Mon, 05 Dec 2011 1604 -
+Received: from .caveonetworks.com ([192.168.16.9]) by 
caexch01.caveonetworks.com with M SMT
+   n, 5 Dec 2011 1636 -
+Received: from dd1.caveonetworks.com ([64.2.3.195]) by 
caexch01.caveonetworks.com over TLS red channel with Microsoft SMTPS790.4675);
+ Mon, 5 Dec 2011 9:35 -0800
+Message-ID: 
+Date:on, 05 Dec 2011 16:49:35 -0800
+From:avid Daney 
+nt: lla/5.0 (X11; U; Linux x86_64; en-US; rv5) Gecko/20101027 
Fedora/3.0.10-1.fc12 T0
+MIon: 1.0
+To: tils 
+CC:  s ,
+ l Lauss >,
+ n MIPS <>
+Subject: [Patch]: Fix ld p38 Fres on m.
+Content-Type: multipart/mixed;
+ boundary="080709040708040308010506"
+X-inalArrivalTime: 06 Dec 2011 00:49:35.0825 (UTC)

[PATCH 08/10] parse(mail|archive): handle early fail within email module

2017-06-28 Thread Daniel Axtens

Certain really messed up email messages can cause a failure within
the email module (at least on py3). Catch this.

Signed-off-by: Daniel Axtens 
---
 patchwork/management/commands/parsearchive.py |  9 
 patchwork/management/commands/parsemail.py| 31 ---
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/patchwork/management/commands/parsearchive.py 
b/patchwork/management/commands/parsearchive.py
index a3c8360186c8..3aab58a45bcd 100644
--- a/patchwork/management/commands/parsearchive.py
+++ b/patchwork/management/commands/parsearchive.py
@@ -77,6 +77,15 @@ class Command(BaseCommand):
 
 count = len(mbox)
 
+# detect broken mails in the mbox
+# see earlyfail fuzz test on py3
+try:
+for m in mbox:
+pass
+except AttributeError:
+logger.warning('Broken mbox/Maildir, aborting')
+return
+
 logger.info('Parsing %d mails', count)
 for i, msg in enumerate(mbox):
 try:
diff --git a/patchwork/management/commands/parsemail.py 
b/patchwork/management/commands/parsemail.py
index 9adfb25b09e3..52ec8bc56899 100644
--- a/patchwork/management/commands/parsemail.py
+++ b/patchwork/management/commands/parsemail.py
@@ -58,20 +58,25 @@ class Command(base.BaseCommand):
 def handle(self, *args, **options):
 infile = args[0] if args else options['infile']
 
-if infile:
-logger.info('Parsing mail loaded by filename')
-if six.PY3:
-with open(infile, 'rb') as file_:
-mail = email.message_from_binary_file(file_)
-else:
-with open(infile) as file_:
-mail = email.message_from_file(file_)
-else:
-logger.info('Parsing mail loaded from stdin')
-if six.PY3:
-mail = email.message_from_binary_file(sys.stdin.buffer)
+try:
+if infile:
+logger.info('Parsing mail loaded by filename')
+if six.PY3:
+with open(infile, 'rb') as file_:
+mail = email.message_from_binary_file(file_)
+else:
+with open(infile) as file_:
+mail = email.message_from_file(file_)
 else:
-mail = email.message_from_file(sys.stdin)
+logger.info('Parsing mail loaded from stdin')
+if six.PY3:
+mail = email.message_from_binary_file(sys.stdin.buffer)
+else:
+mail = email.message_from_file(sys.stdin)
+except AttributeError:
+logger.warning("Broken email ignored")
+return
+
 try:
 result = parse_mail(mail, options['list_id'])
 if result:
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 06/10] parser: better date parsing

2017-06-28 Thread Daniel Axtens

It turns out that there is a lot that can go wrong in parsing a
date. OverflowError, ValueError and OSError have all been observed.

If these go wrong, substitute the current datetime.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 203e11584504..80450c2e4860 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -344,10 +344,33 @@ def find_date(mail):
 h = clean_header(mail.get('Date', ''))
 if not h:
 return datetime.datetime.utcnow()
+
 t = parsedate_tz(h)
 if not t:
 return datetime.datetime.utcnow()
-return datetime.datetime.utcfromtimestamp(mktime_tz(t))
+
+try:
+d = datetime.datetime.utcfromtimestamp(mktime_tz(t))
+except OverflowError:
+# If you have a date like:
+# Date: Wed, 4 Jun 2014 17:50:46 0
+# then you can end up with:
+# OverflowError: Python int too large to convert to C long
+d = datetime.datetime.utcnow()
+except ValueError:
+# If you have a date like:
+# Date:, 11 Sep 2016 23:22:904070804030804 +0100
+# then you can end up with:
+# ValueError: year is out of range
+d = datetime.datetime.utcnow()
+except OSError:
+# If you have a date like:
+# Date:, 11 Sep 2016 407080403080105:04 +0100
+# then you can end up with (in py3)
+# OSError: [Errno 75] Value too large for defined data type
+d = datetime.datetime.utcnow()
+
+return d
 
 
 def find_headers(mail):
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 04/10] parser: catch failures in decoding headers

2017-06-28 Thread Daniel Axtens

Headers can fail to decode:
 - if a part cannot be encoded as ascii
 - if the coding hint names a codec that doesn't exist
 - if there's a null byte in the codec name

Catch these errors.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 19ed7ca4e315..032af8a7be7c 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -90,9 +90,11 @@ def sanitise_header(header_contents, header_name=None):
 header = make_header(value,
  header_name=header_name,
  continuation_ws='\t')
-except UnicodeDecodeError:
-# At least one of the parts cannot be encoded as ascii.
-# Find out which one and fix it somehow.
+except (UnicodeDecodeError, LookupError, ValueError, TypeError):
+#  - a part cannot be encoded as ascii. (UnicodeDecodeError), or
+#  - we don't have a codec matching the hint (LookupError)
+#  - the codec has a null byte (Py3 ValueError/Py2 TypeError)
+# Find out which part and fix it somehow.
 #
 # We get here under Py2 when there's non-7-bit chars in header,
 # or under Py2 or Py3 where decoding with the coding hint fails.
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 05/10] parser: deal with headers entirely failing to parse

2017-06-28 Thread Daniel Axtens

It turns out that the attempts in clean_header() to convert
headers to strings are not guaranteed to work: you can end up with,
for example, a base64 decoding error which makes it impossible
to determine any header content.

In this case, sanitise_header() should return None, and thus
clean_header() should return None. We then need to plumb that
through.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 70 +
 1 file changed, 55 insertions(+), 15 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 032af8a7be7c..203e11584504 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -23,6 +23,7 @@ from email.header import decode_header
 from email.header import make_header
 from email.utils import mktime_tz
 from email.utils import parsedate_tz
+from email.errors import HeaderParseError
 from fnmatch import fnmatch
 import logging
 import re
@@ -67,6 +68,13 @@ def sanitise_header(header_contents, header_name=None):
 then encode the result with make_header()
 """
 
+try:
+value = decode_header(header_contents)
+except HeaderParseError:
+# something has gone really wrong with header parsing.
+# (e.g. base64 decoding) We probably can't recover, so:
+return None
+
 # We have some Py2/Py3 issues here.
 #
 # Firstly, the email parser (before we get here)
@@ -85,7 +93,6 @@ def sanitise_header(header_contents, header_name=None):
 # We solve this by catching any Unicode errors, and then manually
 # handling any interesting headers.
 
-value = decode_header(header_contents)
 try:
 header = make_header(value,
  header_name=header_name,
@@ -130,6 +137,9 @@ def clean_header(header):
 
 sane_header = sanitise_header(header)
 
+if sane_header is None:
+return None
+
 # on Py2, we want to do unicode(), on Py3, str().
 # That gets us the decoded, un-wrapped header.
 if six.PY2:
@@ -157,9 +167,12 @@ def find_project_by_header(mail):
 
 for header in list_id_headers:
 if header in mail:
+h = clean_header(mail.get(header))
+if not h:
+continue
 
 for listid_re in listid_res:
-match = listid_re.match(clean_header(mail.get(header)))
+match = listid_re.match(h)
 if match:
 break
 
@@ -205,7 +218,11 @@ def _find_series_by_references(project, mail):
 Returns:
 The matching ``Series`` instance, if any
 """
-for ref in [clean_header(mail.get('Message-Id'))] + find_references(mail):
+refs = find_references(mail)
+h = clean_header(mail.get('Message-Id'))
+if h:
+refs = [h] + refs
+for ref in refs:
 try:
 return SeriesReference.objects.get(
 msgid=ref, series__project=project).series
@@ -274,6 +291,10 @@ def find_series(project, mail):
 
 def find_author(mail):
 from_header = clean_header(mail.get('From'))
+
+if not from_header:
+raise ValueError("Invalid 'From' header")
+
 name, email = (None, None)
 
 # tuple of (regex, fn)
@@ -320,7 +341,10 @@ def find_author(mail):
 
 
 def find_date(mail):
-t = parsedate_tz(clean_header(mail.get('Date', '')))
+h = clean_header(mail.get('Date', ''))
+if not h:
+return datetime.datetime.utcnow()
+t = parsedate_tz(h)
 if not t:
 return datetime.datetime.utcnow()
 return datetime.datetime.utcfromtimestamp(mktime_tz(t))
@@ -331,7 +355,7 @@ def find_headers(mail):
for key, value in mail.items()]
 
 strings = [('%s: %s' % (key, header.encode()))
-   for (key, header) in headers]
+   for (key, header) in headers if header is not None]
 
 return '\n'.join(strings)
 
@@ -347,11 +371,16 @@ def find_references(mail):
 
 if 'In-Reply-To' in mail:
 for in_reply_to in mail.get_all('In-Reply-To'):
-refs.append(clean_header(in_reply_to).strip())
+r = clean_header(in_reply_to)
+if r:
+refs.append(r.strip())
 
 if 'References' in mail:
 for references_header in mail.get_all('References'):
-references = clean_header(references_header).split()
+h = clean_header(references_header)
+if not h:
+continue
+references = h.split()
 references.reverse()
 for ref in references:
 ref = ref.strip()
@@ -573,6 +602,9 @@ def clean_subject(subject, drop_prefixes=None):
 prefix_re = re.compile(r'^\[([^\]]*)\]\s*(.*)$')
 subject = clean_header(subject)
 
+if not subject:
+raise ValueError("Invalid 'Subject' header")
+
 if drop_prefixes is None:
 drop_prefixes = []
 else:
@@ -610,7 +642,11 @@ def subject_check(subject):
 """Determine if a mail is a reply."""

[PATCH 02/10] parser: don't assume headers are strings

2017-06-28 Thread Daniel Axtens

In python3, mail.get() can return either a string, or an
email.header.Header type.

clean_header() is designed to clean headers into strings,
so make sure we use that everywhere.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 4903aa8237e6..3ab4eb3d2011 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -157,7 +157,7 @@ def find_project_by_header(mail):
 if header in mail:
 
 for listid_re in listid_res:
-match = listid_re.match(mail.get(header))
+match = listid_re.match(clean_header(mail.get(header)))
 if match:
 break
 
@@ -203,7 +203,7 @@ def _find_series_by_references(project, mail):
 Returns:
 The matching ``Series`` instance, if any
 """
-for ref in [mail.get('Message-Id')] + find_references(mail):
+for ref in [clean_header(mail.get('Message-Id'))] + find_references(mail):
 try:
 return SeriesReference.objects.get(
 msgid=ref, series__project=project).series
@@ -318,7 +318,7 @@ def find_author(mail):
 
 
 def find_date(mail):
-t = parsedate_tz(mail.get('Date', ''))
+t = parsedate_tz(clean_header(mail.get('Date', '')))
 if not t:
 return datetime.datetime.utcnow()
 return datetime.datetime.utcfromtimestamp(mktime_tz(t))
@@ -345,11 +345,11 @@ def find_references(mail):
 
 if 'In-Reply-To' in mail:
 for in_reply_to in mail.get_all('In-Reply-To'):
-refs.append(in_reply_to.strip())
+refs.append(clean_header(in_reply_to).strip())
 
 if 'References' in mail:
 for references_header in mail.get_all('References'):
-references = references_header.split()
+references = clean_header(references_header).split()
 references.reverse()
 for ref in references:
 ref = ref.strip()
@@ -788,7 +788,7 @@ def parse_pull_request(content):
 
 def find_state(mail):
 """Return the state with the given name or the default."""
-state_name = mail.get('X-Patchwork-State', '').strip()
+state_name = clean_header(mail.get('X-Patchwork-State', '')).strip()
 if state_name:
 try:
 return State.objects.get(name__iexact=state_name)
@@ -825,7 +825,7 @@ def find_delegate_by_filename(project, filenames):
 
 def find_delegate_by_header(mail):
 """Return the delegate with the given email or None."""
-delegate_email = mail.get('X-Patchwork-Delegate', '').strip()
+delegate_email = clean_header(mail.get('X-Patchwork-Delegate', '')).strip()
 if delegate_email:
 try:
 return User.objects.get(email__iexact=delegate_email)
@@ -854,7 +854,7 @@ def parse_mail(mail, list_id=None):
 if 'Message-Id' not in mail:
 raise ValueError("Missing 'Message-Id' header")
 
-hint = mail.get('X-Patchwork-Hint', '').lower()
+hint = clean_header(mail.get('X-Patchwork-Hint', '')).lower()
 if hint == 'ignore':
 logger.debug("Ignoring email due to 'ignore' hint")
 return
@@ -870,7 +870,7 @@ def parse_mail(mail, list_id=None):
 
 # parse metadata
 
-msgid = mail.get('Message-Id').strip()
+msgid = clean_header(mail.get('Message-Id')).strip()
 author = find_author(mail)
 subject = mail.get('Subject')
 name, prefixes = clean_subject(subject, [project.linkname])
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

2017-06-28 Thread Daniel Axtens

On Python3 this presents as a ValueError
On Python2 this presents as a TypeError

In both cases, catch these exceptions.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 3ab4eb3d2011..19ed7ca4e315 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -436,7 +436,7 @@ def _find_content(mail):
 if charset is not None:
 try:
 codecs.lookup(charset)
-except LookupError:
+except (LookupError, ValueError, TypeError):
 charset = None
 
 # If there is no charset or if it is unknown, then try some common
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 01/10] parser: fix charset 'guessing' algorithm

2017-06-28 Thread Daniel Axtens

The charset guessing algorithm doesn't work if it has to guess
multiple charsets, as it overwrites the payload with None.

Signed-off-by: Daniel Axtens 
---
 patchwork/parser.py | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/patchwork/parser.py b/patchwork/parser.py
index 794a5eac2d29..4903aa8237e6 100644
--- a/patchwork/parser.py
+++ b/patchwork/parser.py
@@ -448,10 +448,11 @@ def _find_content(mail):
 
 for cset in try_charsets:
 try:
-payload = six.text_type(payload, cset)
+new_payload = six.text_type(payload, cset)
 break
 except UnicodeDecodeError:
-payload = None
+new_payload = None
+payload = new_payload
 
 # Could not find a valid decoded payload.  Fail.
 if payload is None:
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH 00/10] Fuzzing the parser: fixes

2017-06-28 Thread Daniel Axtens

Hi all,

I fuzzed parsemail with python-afl/afl-fuzz.

Predictably, it did not hold up well. Nothing major, just a bunch of
edge cases. Date parsing held up particularly poorly.

Currently, if we hit any of the errors this patch set fixes, the
entire mail will be rejected. With this patch set, the emails have a
much better chance of surviving. This is helpful for us - we have
previously had bug reports where a stray mail has been rejected due to
a corrupt header, which is suboptimal.

Test cases are included. They may not email particularly well so I
have also put them up on my github as a signed tag:
https://github.com/daxtens/patchwork/releases/tag/fuzz-testing

For the interested, the bulk of the fuzzing was done on an AWS cloud
instance, using something quite similar to the standard docker-compose
setup. Several hundered thousand executions were done - nowhere near
what I'd like but I can't seem to get execution speed above about 1
execution per second. If I was able to confidently mock out the db or
extract the parser out of the django framework it would go faster, but
both of those were a bit more than I was easily able to do. I also
picked up some bugs when interfacing with the db (patch 7) so I feel
like this approach was vindicated.

I also included a patch to allow people to replicate my setup. It's
not really ready to merge and I'm not convinced it's really necessary
as it's not a lot of work to replicate.

Regards,
Daniel

Daniel Axtens (10):
  parser: fix charset 'guessing' algorithm
  parser: don't assume headers are strings
  parser: codec lookup fails when a NUL (\x00) is in the name
  parser: catch failures in decoding headers
  parser: deal with headers entirely failing to parse
  parser: better date parsing
  parser: Don't pass a message-id longer than 255 chars to the db
  parse(mail|archive): handle early fail within email module
  Add fuzzer-generated tests
  [RFC] Fuzzing harness

 patchwork/management/commands/fuzz.py|  88 +++
 patchwork/management/commands/parsearchive.py|   9 ++
 patchwork/management/commands/parsemail.py   |  31 +++---
 patchwork/parser.py  | 126 +-
 patchwork/tests/__init__.py  |   1 +
 patchwork/tests/fuzztests/base64err.mbox |  46 
 patchwork/tests/fuzztests/charset.mbox   | 131 +++
 patchwork/tests/fuzztests/codec-null.mbox| Bin 0 -> 8192 bytes
 patchwork/tests/fuzztests/date-oserror.mbox  | Bin 0 -> 8209 bytes
 patchwork/tests/fuzztests/date-too-long.mbox | Bin 0 -> 1828 bytes
 patchwork/tests/fuzztests/date.mbox  |  44 
 patchwork/tests/fuzztests/dateheader.mbox| Bin 0 -> 580 bytes
 patchwork/tests/fuzztests/earlyfail.mbox | Bin 0 -> 1712 bytes
 patchwork/tests/fuzztests/msgid-len.mbox | Bin 0 -> 1809 bytes
 patchwork/tests/fuzztests/msgid-len2.mbox|  37 +++
 patchwork/tests/fuzztests/msgidheader.mbox   | 131 +++
 patchwork/tests/fuzztests/refshdr.mbox   | Bin 0 -> 816 bytes
 patchwork/tests/fuzztests/unknown-encoding.mbox  | Bin 0 -> 1751 bytes
 patchwork/tests/fuzztests/value2.mbox| Bin 0 -> 1598 bytes
 patchwork/tests/fuzztests/year-out-of-range.mbox | Bin 0 -> 1660 bytes
 patchwork/tests/test_parser.py   |  60 ++-
 tools/docker/Dockerfile  |   2 +
 tools/fuzzer_dict|  52 +
 23 files changed, 714 insertions(+), 44 deletions(-)
 create mode 100644 patchwork/management/commands/fuzz.py
 create mode 100644 patchwork/tests/fuzztests/base64err.mbox
 create mode 100644 patchwork/tests/fuzztests/charset.mbox
 create mode 100644 patchwork/tests/fuzztests/codec-null.mbox
 create mode 100644 patchwork/tests/fuzztests/date-oserror.mbox
 create mode 100644 patchwork/tests/fuzztests/date-too-long.mbox
 create mode 100644 patchwork/tests/fuzztests/date.mbox
 create mode 100644 patchwork/tests/fuzztests/dateheader.mbox
 create mode 100644 patchwork/tests/fuzztests/earlyfail.mbox
 create mode 100644 patchwork/tests/fuzztests/msgid-len.mbox
 create mode 100644 patchwork/tests/fuzztests/msgid-len2.mbox
 create mode 100644 patchwork/tests/fuzztests/msgidheader.mbox
 create mode 100644 patchwork/tests/fuzztests/refshdr.mbox
 create mode 100644 patchwork/tests/fuzztests/unknown-encoding.mbox
 create mode 100644 patchwork/tests/fuzztests/value2.mbox
 create mode 100644 patchwork/tests/fuzztests/year-out-of-range.mbox
 create mode 100644 tools/fuzzer_dict

-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Andrew Donnellan


On 28/06/17 16:55, Daniel Axtens wrote:

This is just a matter of correctly closing files we open.

Signed-off-by: Daniel Axtens 


Reviewed-by: Andrew Donnellan 


---
 patchwork/management/commands/parsearchive.py | 1 +
 patchwork/tests/test_management.py| 2 ++
 patchwork/tests/test_series.py| 9 +
 patchwork/tests/utils.py  | 4 +++-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/patchwork/management/commands/parsearchive.py 
b/patchwork/management/commands/parsearchive.py
index a3c8360186c8..4e102a988e76 100644
--- a/patchwork/management/commands/parsearchive.py
+++ b/patchwork/management/commands/parsearchive.py
@@ -114,3 +114,4 @@ class Command(BaseCommand):
 'errors': errors,
 'new': count - duplicates - dropped - errors,
 })
+mbox.close()
diff --git a/patchwork/tests/test_management.py 
b/patchwork/tests/test_management.py
index f9166739986b..f548fce3b8e5 100644
--- a/patchwork/tests/test_management.py
+++ b/patchwork/tests/test_management.py
@@ -52,6 +52,7 @@ class ParsemailTest(TestCase):
 with self.assertRaises(SystemExit) as exc:
 call_command('parsemail', infile=None)

+sys.stdin.close()
 self.assertEqual(exc.exception.code, 1)

 def test_valid_path(self):
@@ -78,6 +79,7 @@ class ParsemailTest(TestCase):
 call_command('parsemail', infile=None,
  list_id=project.listid)

+sys.stdin.close()
 self.assertEqual(exc.exception.code, 0)

 count = models.Patch.objects.filter(project=project.id).count()
diff --git a/patchwork/tests/test_series.py b/patchwork/tests/test_series.py
index 181fc6d2ac1c..53b5c63e8fa9 100644
--- a/patchwork/tests/test_series.py
+++ b/patchwork/tests/test_series.py
@@ -54,6 +54,7 @@ class _BaseTestCase(TestCase):
 results[1].append(obj)
 else:
 results[2].append(obj)
+mbox.close()

 self.assertParsed(results, counts)

@@ -601,6 +602,8 @@ class SeriesNameTestCase(TestCase):
 self._parse_mail(mbox[2])
 self.assertEqual(cover.latest_series.name, cover_name)

+mbox.close()
+
 def test_no_cover_letter(self):
 """Series without a cover letter.

@@ -621,6 +624,8 @@ class SeriesNameTestCase(TestCase):
 self._parse_mail(mbox[1])
 self.assertEqual(series.name, patch.name)

+mbox.close()
+
 def test_out_of_order(self):
 """Series received out of order.

@@ -645,6 +650,8 @@ class SeriesNameTestCase(TestCase):
 cover = self._parse_mail(mbox[2])
 self.assertEqual(cover.latest_series.name, self._format_name(cover))

+mbox.close()
+
 def test_custom_name(self):
 """Series with custom name.

@@ -673,3 +680,5 @@ class SeriesNameTestCase(TestCase):

 self._parse_mail(mbox[2])
 self.assertEqual(series.name, series_name)
+
+mbox.close()
diff --git a/patchwork/tests/utils.py b/patchwork/tests/utils.py
index 3d0293cab36b..d4005c7729e8 100644
--- a/patchwork/tests/utils.py
+++ b/patchwork/tests/utils.py
@@ -54,7 +54,9 @@ def read_patch(filename, encoding=None):
 else:
 f = open(file_path)

-return f.read()
+result = f.read()
+f.close()
+return result


 error_strings = {



--
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

[PATCH] Remove ResourceWarnings under Py3

2017-06-28 Thread Daniel Axtens

This is just a matter of correctly closing files we open.

Signed-off-by: Daniel Axtens 
---
 patchwork/management/commands/parsearchive.py | 1 +
 patchwork/tests/test_management.py| 2 ++
 patchwork/tests/test_series.py| 9 +
 patchwork/tests/utils.py  | 4 +++-
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/patchwork/management/commands/parsearchive.py 
b/patchwork/management/commands/parsearchive.py
index a3c8360186c8..4e102a988e76 100644
--- a/patchwork/management/commands/parsearchive.py
+++ b/patchwork/management/commands/parsearchive.py
@@ -114,3 +114,4 @@ class Command(BaseCommand):
 'errors': errors,
 'new': count - duplicates - dropped - errors,
 })
+mbox.close()
diff --git a/patchwork/tests/test_management.py 
b/patchwork/tests/test_management.py
index f9166739986b..f548fce3b8e5 100644
--- a/patchwork/tests/test_management.py
+++ b/patchwork/tests/test_management.py
@@ -52,6 +52,7 @@ class ParsemailTest(TestCase):
 with self.assertRaises(SystemExit) as exc:
 call_command('parsemail', infile=None)
 
+sys.stdin.close()
 self.assertEqual(exc.exception.code, 1)
 
 def test_valid_path(self):
@@ -78,6 +79,7 @@ class ParsemailTest(TestCase):
 call_command('parsemail', infile=None,
  list_id=project.listid)
 
+sys.stdin.close()
 self.assertEqual(exc.exception.code, 0)
 
 count = models.Patch.objects.filter(project=project.id).count()
diff --git a/patchwork/tests/test_series.py b/patchwork/tests/test_series.py
index 181fc6d2ac1c..53b5c63e8fa9 100644
--- a/patchwork/tests/test_series.py
+++ b/patchwork/tests/test_series.py
@@ -54,6 +54,7 @@ class _BaseTestCase(TestCase):
 results[1].append(obj)
 else:
 results[2].append(obj)
+mbox.close()
 
 self.assertParsed(results, counts)
 
@@ -601,6 +602,8 @@ class SeriesNameTestCase(TestCase):
 self._parse_mail(mbox[2])
 self.assertEqual(cover.latest_series.name, cover_name)
 
+mbox.close()
+
 def test_no_cover_letter(self):
 """Series without a cover letter.
 
@@ -621,6 +624,8 @@ class SeriesNameTestCase(TestCase):
 self._parse_mail(mbox[1])
 self.assertEqual(series.name, patch.name)
 
+mbox.close()
+
 def test_out_of_order(self):
 """Series received out of order.
 
@@ -645,6 +650,8 @@ class SeriesNameTestCase(TestCase):
 cover = self._parse_mail(mbox[2])
 self.assertEqual(cover.latest_series.name, self._format_name(cover))
 
+mbox.close()
+
 def test_custom_name(self):
 """Series with custom name.
 
@@ -673,3 +680,5 @@ class SeriesNameTestCase(TestCase):
 
 self._parse_mail(mbox[2])
 self.assertEqual(series.name, series_name)
+
+mbox.close()
diff --git a/patchwork/tests/utils.py b/patchwork/tests/utils.py
index 3d0293cab36b..d4005c7729e8 100644
--- a/patchwork/tests/utils.py
+++ b/patchwork/tests/utils.py
@@ -54,7 +54,9 @@ def read_patch(filename, encoding=None):
 else:
 f = open(file_path)
 
-return f.read()
+result = f.read()
+f.close()
+return result
 
 
 error_strings = {
-- 
2.11.0

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork

Re: [PATCH 00/10] Fuzzing the parser: fixes

Re: [PATCH 10/10] [RFC] Fuzzing harness

Re: [PATCH 09/10] Add fuzzer-generated tests

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

Re: [PATCH 06/10] parser: better date parsing

Re: [PATCH 02/10] parser: don't assume headers are strings

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

Re: [PATCH] Remove ResourceWarnings under Py3

Re: [PATCH RFC] events-api: allow filtering by date

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

Re: [PATCH 06/10] parser: better date parsing

Re: Patchwork v2 and beyond

Re: [PATCH 08/10] parse(mail|archive): handle early fail within email module

Re: [PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

Re: [PATCH 06/10] parser: better date parsing

Re: [PATCH 05/10] parser: deal with headers entirely failing to parse

Re: [PATCH 04/10] parser: catch failures in decoding headers

Re: [PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

Re: [PATCH 02/10] parser: don't assume headers are strings

Re: [PATCH 01/10] parser: fix charset 'guessing' algorithm

[PATCH 10/10] [RFC] Fuzzing harness

[PATCH 07/10] parser: Don't pass a message-id longer than 255 chars to the db

[PATCH 09/10] Add fuzzer-generated tests

[PATCH 08/10] parse(mail|archive): handle early fail within email module

[PATCH 06/10] parser: better date parsing

[PATCH 04/10] parser: catch failures in decoding headers

[PATCH 05/10] parser: deal with headers entirely failing to parse

[PATCH 02/10] parser: don't assume headers are strings

[PATCH 03/10] parser: codec lookup fails when a NUL (\x00) is in the name

[PATCH 01/10] parser: fix charset 'guessing' algorithm

[PATCH 00/10] Fuzzing the parser: fixes

Re: [PATCH] Remove ResourceWarnings under Py3

[PATCH] Remove ResourceWarnings under Py3

35 matches

Site Navigation

Mail list logo

Footer information