sorry, this one scrolled off the top, and I didn't read it before
sending my other reply.
On approximately 11/6/2008 9:02 AM, came the following characters from
the keyboard of Barry Warsaw:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Nov 5, 2008, at 6:39 PM, Glenn Linderman wrote:
This is an interesting perspective... "stuff em" does come to mind :)
But I'm not at all clear on what you mean by a round-trip through the
email module. Let me see... if you are creating an email, you (1)
should encode it properly (2) a round-trip is mostly meaningless,
unless you send it to yourself. So you probably mean email that is
received, and that you want to send on. In this case, there is
already a composed/encoded form of the email in hand; it could simply
be sent as is without decoding or re-encoding. That would be quite a
clean round-trip!
There are two ways to create an email DOM. One is out of whole cloth
(i.e. creating Message objects and their subclasses, then attaching them
into a tree). Note that it is a "generator" whose job it is to take the
DOM and produce an RFC-compliant flat textural representation.
I grok this one; but think that for the generator, keeping things in
Unicode until the last minute could be useful. Maybe not as useful as
converting immediately to bytes, though, to reduce the amount of
duplicated code.
The other way to get a DOM is to parse some flat textual
representation. In this case, it is a core design requirement that the
parser never throws an exception, and that there is a way to record and
retrieve the defects in a message.
Sure, this makes sense. My other message suggested keeping the message
flat, and using cached pointers and lengths. Of course, editing with
such a technique could be a problem, because the pointers would have to
be updated. A MIME-mimicking tree of flat subchunks comes to mind...
The core model objects of Message (and their MIME subclasses) and Header
should treat everything internally as bytes. The edges are where you
want to be able to accept varying types, but always convert to bytes
internally. Edges of this system include the parser, the generator, and
various setter and getter methods of Message and Header.
The current code has a strong desire to be idempotent, so that
parser->DOM->generator output is exactly the same as input. Small
changes to the DOM or content in between should have minimal effect.
For example, if you delete a header and then add it back, the header
will show up at the end of the RFC 2822 header list, but everything else
about the message will be unchanged.
Ah, this is your definition of idempotent! Which is what I expected,
but wasn't sure.
This is reasonable. One _could_ even convince the header to show up in
the original spot, if you keep a NULL header placeholder around for
deleted headers.... that would vanish only when regenerating.
Currently idempotency is broken for defective messages. The generator
is guaranteed to produce RFC-compliant output, repairing defects like
missing boundaries and such.
So it seems you are happy with this level of "fixing" things?
I guess I'm not terribly concerned about the readability of improperly
encoded email messages, whether they are spam or ham. For the
purposes of SpamBayes (which I assume is similar to spamassassin, only
written in Python), it doesn't matter if the data is readable, only
that it is recognizably similar. So a consistent mis-transliteration
is as good a a correct decoding.
The key thing is that parse should never ever raise an exception. We've
learned the hard way that this is the most practical thing because at
the level most parsing happens, you really cannot handle any errors.
So you don't have a goal to make mangled, multi-character encodings
suddenly be readable via the email lib? Only to provide the data in raw
form, so that Mr. Turnbull can implement that on top, in emacs?
For ham, the correspondent should be informed that there are problems
with their software, so that they can upgrade or reconfigure it.
That's a practical impossibility in real-world applications, as is
simply discarding malformed messages. Email sucks.
I agree it is impossible to do that automatically. But if a
correspondent suddenly gets broken software, I attempt to inform them of
that... and as long as their email address comes through, I can...
And I don't think I've ever proposed discarding malformed messages; just
transliterating them in some way that (drum roll) doesn't cause
exceptions...
Sorry I wrote a bit before looking at the API, which is more robust than
I expected, from Mr. Turnbull's writings. I am curious what the list of
API deficiencies that have been determined are... is there a list somewhere?
My summary tried to be a start on that, or an augmentation. Seems I
tried to get to bug# last night, but the 'net wasn't responsive. Can't
find the number now, in a quick look through the messages in this thread.
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe:
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com