Andrew Donnellan <andrew.donnel...@au1.ibm.com> writes: > On 28/06/17 17:48, Daniel Axtens wrote: >> Certain really messed up email messages can cause a failure within >> the email module (at least on py3). Catch this. >> >> Signed-off-by: Daniel Axtens <d...@axtens.net> >> --- >> patchwork/management/commands/parsearchive.py | 9 ++++++++ >> patchwork/management/commands/parsemail.py | 31 >> ++++++++++++++++----------- >> 2 files changed, 27 insertions(+), 13 deletions(-) >> >> diff --git a/patchwork/management/commands/parsearchive.py >> b/patchwork/management/commands/parsearchive.py >> index a3c8360186c8..3aab58a45bcd 100644 >> --- a/patchwork/management/commands/parsearchive.py >> +++ b/patchwork/management/commands/parsearchive.py >> @@ -77,6 +77,15 @@ class Command(BaseCommand): >> >> count = len(mbox) >> >> + # detect broken mails in the mbox >> + # see earlyfail fuzz test on py3 >> + try: >> + for m in mbox: >> + pass >> + except AttributeError: >> + logger.warning('Broken mbox/Maildir, aborting') >> + return >> + > > This seems a bit non-obvious and could do with a little bit of explanation?
The message, or the code structure? I structured the code this way rather than the more obvious try: mbox = [m for m in mbox] ... because the more obvious way requires loading the entire mbox/maildir into memory and I was a bit worried about the memory consumption of that when parsing a large mbox. I agree a more helpful comment would have been in order. Stephen, do you want a v2 of this patch by itself? I can resend the series but it seems a bit excessive... Or I could do a follow-up. Regards, Daniel > >> logger.info('Parsing %d mails', count) >> for i, msg in enumerate(mbox): >> try: >> diff --git a/patchwork/management/commands/parsemail.py >> b/patchwork/management/commands/parsemail.py >> index 9adfb25b09e3..52ec8bc56899 100644 >> --- a/patchwork/management/commands/parsemail.py >> +++ b/patchwork/management/commands/parsemail.py >> @@ -58,20 +58,25 @@ class Command(base.BaseCommand): >> def handle(self, *args, **options): >> infile = args[0] if args else options['infile'] >> >> - if infile: >> - logger.info('Parsing mail loaded by filename') >> - if six.PY3: >> - with open(infile, 'rb') as file_: >> - mail = email.message_from_binary_file(file_) >> - else: >> - with open(infile) as file_: >> - mail = email.message_from_file(file_) >> - else: >> - logger.info('Parsing mail loaded from stdin') >> - if six.PY3: >> - mail = email.message_from_binary_file(sys.stdin.buffer) >> + try: >> + if infile: >> + logger.info('Parsing mail loaded by filename') >> + if six.PY3: >> + with open(infile, 'rb') as file_: >> + mail = email.message_from_binary_file(file_) >> + else: >> + with open(infile) as file_: >> + mail = email.message_from_file(file_) >> else: >> - mail = email.message_from_file(sys.stdin) >> + logger.info('Parsing mail loaded from stdin') >> + if six.PY3: >> + mail = email.message_from_binary_file(sys.stdin.buffer) >> + else: >> + mail = email.message_from_file(sys.stdin) >> + except AttributeError: >> + logger.warning("Broken email ignored") >> + return >> + >> try: >> result = parse_mail(mail, options['list_id']) >> if result: >> > > -- > Andrew Donnellan OzLabs, ADL Canberra > andrew.donnel...@au1.ibm.com IBM Australia Limited _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork