On Thu, 20 Jan 2011 17:58:36 -0500, Bob Kline wrote: > Thanks. I'm not sure everyone would agree that it's OK to collapse > multiple consecutive spaces into one, but I'm beginning to suspect that > those more concerned with preserving as much as possible of the original > message are in the minority. It sounds like my take-home distillation > from this thread is "yes, the module ignores what the spec says about > unfolding, but it doesn't matter." I guess I can live with that. > I've been doing stuff in this area with the JavaMail package, though not as yet in Python. I've learnt that you parse the headers you can extract values that work well for comparisons, as database keys, etc. but are not guaranteed to let you reconstitute the original header byte for byte. If preserving the message exactly as received the solution is to parse the message to extract the headers and MIME parts you need for the application to carry out its function, but keep the original, unparsed message so you can pass it on.
The other gotcha is assuming that the MUA author read and understood the RFCs. Very many barely glanced at RFCs and/or misunderstood them. Consequently, if you use strict parsing you'll be surprised how many messages get rejected for having invalid headers or MIME headers. Fot instance, the mistakes some MUAs make when outputting To, CC and BCC headers with multiple addresses have to be seen to be believed. If the Python e-mail module lets you, set it to use lenient parsing. If this isn't an option you may well find yourself having to fix up messages before you can parse them successfully. -- martin@ | Martin Gregorie gregorie. | Essex, UK org | -- http://mail.python.org/mailman/listinfo/python-list