-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Nov 5, 2008, at 6:39 PM, Glenn Linderman wrote:
This is an interesting perspective... "stuff em" does come to mind :)
But I'm not at all clear on what you mean by a round-trip through
the email module. Let me see... if you are creating an email, you
(1) should encode it properly (2) a round-trip is mostly
meaningless, unless you send it to yourself. So you probably mean
email that is received, and that you want to send on. In this case,
there is already a composed/encoded form of the email in hand; it
could simply be sent as is without decoding or re-encoding. That
would be quite a clean round-trip!
There are two ways to create an email DOM. One is out of whole cloth
(i.e. creating Message objects and their subclasses, then attaching
them into a tree). Note that it is a "generator" whose job it is to
take the DOM and produce an RFC-compliant flat textural representation.
The other way to get a DOM is to parse some flat textual
representation. In this case, it is a core design requirement that
the parser never throws an exception, and that there is a way to
record and retrieve the defects in a message.
The core model objects of Message (and their MIME subclasses) and
Header should treat everything internally as bytes. The edges are
where you want to be able to accept varying types, but always convert
to bytes internally. Edges of this system include the parser, the
generator, and various setter and getter methods of Message and Header.
The current code has a strong desire to be idempotent, so that parser-
>DOM->generator output is exactly the same as input. Small changes
to the DOM or content in between should have minimal effect. For
example, if you delete a header and then add it back, the header will
show up at the end of the RFC 2822 header list, but everything else
about the message will be unchanged.
Currently idempotency is broken for defective messages. The generator
is guaranteed to produce RFC-compliant output, repairing defects like
missing boundaries and such.
I guess I'm not terribly concerned about the readability of
improperly encoded email messages, whether they are spam or ham.
For the purposes of SpamBayes (which I assume is similar to
spamassassin, only written in Python), it doesn't matter if the data
is readable, only that it is recognizably similar. So a consistent
mis-transliteration is as good a a correct decoding.
The key thing is that parse should never ever raise an exception.
We've learned the hard way that this is the most practical thing
because at the level most parsing happens, you really cannot handle
any errors.
For ham, the correspondent should be informed that there are
problems with their software, so that they can upgrade or
reconfigure it.
That's a practical impossibility in real-world applications, as is
simply discarding malformed messages. Email sucks.
- -Barry
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
iQCVAwUBSRMjE3EjvBPtnXfVAQKMYAP/VbzETAnCegJavJ4zIB37hbWBWmp4yClY
RRzdTXQQY8VxFioxlVwHaxa7AHW/xADsFEkOsm0saWnld4pbu9m00T6KccAOp3eY
BbqXUixFRR6DmyiuLk+0F/cBlgnPH8y3XnlTXsEdXS2za5tW6YoyCsfTu9xGl0Qp
aC7ta6xcvNk=
=NgCu
-----END PGP SIGNATURE-----
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com