sorry, this one scrolled off the top, and I didn't read it before sending my other reply.

On approximately 11/6/2008 9:02 AM, came the following characters from the keyboard of Barry Warsaw:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 5, 2008, at 6:39 PM, Glenn Linderman wrote:

This is an interesting perspective... "stuff em" does come to mind :)

But I'm not at all clear on what you mean by a round-trip through the email module. Let me see... if you are creating an email, you (1) should encode it properly (2) a round-trip is mostly meaningless, unless you send it to yourself. So you probably mean email that is received, and that you want to send on. In this case, there is already a composed/encoded form of the email in hand; it could simply be sent as is without decoding or re-encoding. That would be quite a clean round-trip!

There are two ways to create an email DOM. One is out of whole cloth (i.e. creating Message objects and their subclasses, then attaching them into a tree). Note that it is a "generator" whose job it is to take the DOM and produce an RFC-compliant flat textural representation.


I grok this one; but think that for the generator, keeping things in Unicode until the last minute could be useful. Maybe not as useful as converting immediately to bytes, though, to reduce the amount of duplicated code.


The other way to get a DOM is to parse some flat textual representation. In this case, it is a core design requirement that the parser never throws an exception, and that there is a way to record and retrieve the defects in a message.


Sure, this makes sense. My other message suggested keeping the message flat, and using cached pointers and lengths. Of course, editing with such a technique could be a problem, because the pointers would have to be updated. A MIME-mimicking tree of flat subchunks comes to mind...


The core model objects of Message (and their MIME subclasses) and Header should treat everything internally as bytes. The edges are where you want to be able to accept varying types, but always convert to bytes internally. Edges of this system include the parser, the generator, and various setter and getter methods of Message and Header.

The current code has a strong desire to be idempotent, so that parser->DOM->generator output is exactly the same as input. Small changes to the DOM or content in between should have minimal effect. For example, if you delete a header and then add it back, the header will show up at the end of the RFC 2822 header list, but everything else about the message will be unchanged.


Ah, this is your definition of idempotent! Which is what I expected, but wasn't sure.

This is reasonable. One _could_ even convince the header to show up in the original spot, if you keep a NULL header placeholder around for deleted headers.... that would vanish only when regenerating.


Currently idempotency is broken for defective messages. The generator is guaranteed to produce RFC-compliant output, repairing defects like missing boundaries and such.


So it seems you are happy with this level of "fixing" things?


I guess I'm not terribly concerned about the readability of improperly encoded email messages, whether they are spam or ham. For the purposes of SpamBayes (which I assume is similar to spamassassin, only written in Python), it doesn't matter if the data is readable, only that it is recognizably similar. So a consistent mis-transliteration is as good a a correct decoding.

The key thing is that parse should never ever raise an exception. We've learned the hard way that this is the most practical thing because at the level most parsing happens, you really cannot handle any errors.


So you don't have a goal to make mangled, multi-character encodings suddenly be readable via the email lib? Only to provide the data in raw form, so that Mr. Turnbull can implement that on top, in emacs?


For ham, the correspondent should be informed that there are problems with their software, so that they can upgrade or reconfigure it.

That's a practical impossibility in real-world applications, as is simply discarding malformed messages. Email sucks.


I agree it is impossible to do that automatically. But if a correspondent suddenly gets broken software, I attempt to inform them of that... and as long as their email address comes through, I can...

And I don't think I've ever proposed discarding malformed messages; just transliterating them in some way that (drum roll) doesn't cause exceptions...

Sorry I wrote a bit before looking at the API, which is more robust than I expected, from Mr. Turnbull's writings. I am curious what the list of API deficiencies that have been determined are... is there a list somewhere?

My summary tried to be a start on that, or an augmentation. Seems I tried to get to bug# last night, but the 'net wasn't responsive. Can't find the number now, in a quick look through the messages in this thread.

--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to