-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 5, 2008, at 6:39 PM, Glenn Linderman wrote:

This is an interesting perspective... "stuff em" does come to mind :)

But I'm not at all clear on what you mean by a round-trip through the email module. Let me see... if you are creating an email, you (1) should encode it properly (2) a round-trip is mostly meaningless, unless you send it to yourself. So you probably mean email that is received, and that you want to send on. In this case, there is already a composed/encoded form of the email in hand; it could simply be sent as is without decoding or re-encoding. That would be quite a clean round-trip!

There are two ways to create an email DOM. One is out of whole cloth (i.e. creating Message objects and their subclasses, then attaching them into a tree). Note that it is a "generator" whose job it is to take the DOM and produce an RFC-compliant flat textural representation.

The other way to get a DOM is to parse some flat textual representation. In this case, it is a core design requirement that the parser never throws an exception, and that there is a way to record and retrieve the defects in a message.

The core model objects of Message (and their MIME subclasses) and Header should treat everything internally as bytes. The edges are where you want to be able to accept varying types, but always convert to bytes internally. Edges of this system include the parser, the generator, and various setter and getter methods of Message and Header.

The current code has a strong desire to be idempotent, so that parser- >DOM->generator output is exactly the same as input. Small changes to the DOM or content in between should have minimal effect. For example, if you delete a header and then add it back, the header will show up at the end of the RFC 2822 header list, but everything else about the message will be unchanged.

Currently idempotency is broken for defective messages. The generator is guaranteed to produce RFC-compliant output, repairing defects like missing boundaries and such.

I guess I'm not terribly concerned about the readability of improperly encoded email messages, whether they are spam or ham. For the purposes of SpamBayes (which I assume is similar to spamassassin, only written in Python), it doesn't matter if the data is readable, only that it is recognizably similar. So a consistent mis-transliteration is as good a a correct decoding.

The key thing is that parse should never ever raise an exception. We've learned the hard way that this is the most practical thing because at the level most parsing happens, you really cannot handle any errors.

For ham, the correspondent should be informed that there are problems with their software, so that they can upgrade or reconfigure it.

That's a practical impossibility in real-world applications, as is simply discarding malformed messages. Email sucks.

- -Barry


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iQCVAwUBSRMjE3EjvBPtnXfVAQKMYAP/VbzETAnCegJavJ4zIB37hbWBWmp4yClY
RRzdTXQQY8VxFioxlVwHaxa7AHW/xADsFEkOsm0saWnld4pbu9m00T6KccAOp3eY
BbqXUixFRR6DmyiuLk+0F/cBlgnPH8y3XnlTXsEdXS2za5tW6YoyCsfTu9xGl0Qp
aC7ta6xcvNk=
=NgCu
-----END PGP SIGNATURE-----
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to