A.M. Kuchling wrote: > On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >> I will leave the profiler output to speak for itself, since I can find >> nothing much to say about it except that there's a hell of a lot of >> decoding going on inside mailbox.iterkeys(). > > The problem is actually in _generate_toc(), which is reading through > the entire file to figure out where all the 'From' lines that start > messages are located. TextIOWrapper()'s tell() method seems to be > very slow, so one help is to only call tell() when necessary; patch: > > -> svn diff Lib/ > Index: Lib/mailbox.py > =================================================================== > --- Lib/mailbox.py (revision 82346) > +++ Lib/mailbox.py (working copy) > @@ -775,13 +775,14 @@ > starts, stops = [], [] > self._file.seek(0) > while True: > - line_pos = self._file.tell() > line = self._file.readline() > if line.startswith('From '): > + line_pos = self._file.tell() > if len(stops) < len(starts): > stops.append(line_pos - len(os.linesep)) > starts.append(line_pos) > elif not line: > + line_pos = self._file.tell() > stops.append(line_pos) > break > self._toc = dict(enumerate(zip(starts, stops))) > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this.
Neither! You can't open them as 7-bit text, because real-world email does contain bytes whose ordinal value exceeds 127. You can't open them using a text encoding because theoretically there might be ASCII headers that indicate that parts of the content are in specific character sets or encodings. If only we had a data structure that easily allowed us to manipulate 8-bit characters ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com