Re: Issues when reading mailboxes from alioth-lists.debian.net
On Wed, Aug 19, 2020 at 10:31:55PM +0530, Nilesh Patra wrote: > > For me the error goes way for me when I change line 781 in > /usr/lib/python3.8/mailbox.py > to: > > msg.set_from(from_line[5:].decode('utf-8')) > > May be this is a minor feature enhancement since at the moment messages with > unicodes don't seem to be decoded. > Or there's an API change which I'm not aware of. > > Either way this should act like a temorary fix for now. Let me know if this > doesn't seem right. BTW, its a regression compared to Python2. When calling the python2 test_mbox.py everything works. Kind regards Andreas. -- http://fam-tille.de
Re: Issues when reading mailboxes from alioth-lists.debian.net
Hi, > Traceback (most recent call last): > File "./test_mbox.py", line 6, in >if mbox_file.items() != []: > File "/usr/lib/python3.8/mailbox.py", line 132, in items >return list(self.iteritems()) > File "/usr/lib/python3.8/mailbox.py", line 125, in iteritems >value = self[key] > File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__ >return self.get_message(key) > File "/usr/lib/python3.8/mailbox.py", line 781, in get_message >msg.set_from(from_line[5:].decode('ascii')) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: > ordinal not in range(128) > Exit code: 1 For me the error goes way for me when I change line 781 in /usr/lib/python3.8/mailbox.py to: msg.set_from(from_line[5:].decode('utf-8')) May be this is a minor feature enhancement since at the moment messages with unicodes don't seem to be decoded. Or there's an API change which I'm not aware of. Either way this should act like a temorary fix for now. Let me know if this doesn't seem right. Kinds Regards, Nilesh
Issues when reading mailboxes from alioth-lists.debian.net
Hi, in the teammetrics project I'm trying to parse mailboxes. This worked with Python2 but after porting the code to Python3 I get some encoding troubles. A specific one seem to be an error in the mailbox module. Please run the attached script test_mbox which downloads one of the critical mbox files from aliot-lists.debian.net and calls the also attached simple Python3 script which ends in: Traceback (most recent call last): File "./test_mbox.py", line 6, in if mbox_file.items() != []: File "/usr/lib/python3.8/mailbox.py", line 132, in items return list(self.iteritems()) File "/usr/lib/python3.8/mailbox.py", line 125, in iteritems value = self[key] File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__ return self.get_message(key) File "/usr/lib/python3.8/mailbox.py", line 781, in get_message msg.set_from(from_line[5:].decode('ascii')) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 37: ordinal not in range(128) Exit code: 1 IMHO it is a bug if those mailboxes can't be read. Am I missing something? Kind regards Andreas. -- http://fam-tille.de #!/bin/sh wget https://alioth-lists.debian.net/pipermail/pkg-java-maintainers/2020-May.txt.gz gunzip 2020-May.txt.gz python3 test_mbox.py #!/usr/bin/python3 import mailbox mbox_file = mailbox.mbox('2020-May.txt') if mbox_file.items() != []: print("OK")