Petri Lehtinen <pe...@digip.org> added the comment:

Actually, you're right. Sorry for overlooking the RFC. But that said, the RFC 
itself refers to the same manpage as a reference that's "mostly authoritative 
for those variations that are otherwise only documented in anecdotal form". So 
I guess it's quite a good reference after all :)

In Appendix A, RFC 4155 defines a set of rules for a "default" mbox format that 
maximizes interoperability between different mbox implementations.

The important things in the RFC concerning this issue are:

* There MUST be an empty line after each message.

* The RFC does not specify any escape syntax for message body lines starting 
with "From ". It says: "Recipient systems are expected to parse full separator 
lines as they are documented above."

Because the RFC states that there must be an empty line after each message, and 
it aims for maximum interoperability, I think we can assume that there always 
is an empty line there. But looking for "\n\nFrom " is not enough for finding 
the starting points of messages. We should actually parse the whole separator 
line which consists of "From ", an email address (addr-spec in RFC 2822), a 
timestamp (in UNIX ctime format without timezone), and a newline character.

I think this should be the default mode for reading mbox files. See #13698 for 
adding support for other formats.

----------
components: +email
nosy: +barry
resolution: invalid -> 
stage: committed/rejected -> 
status: closed -> open

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue11728>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to