On Jan 20, 9:55 am, Bob Kline <bkl...@rksystems.com> wrote: > On 1/20/2011 12:23 PM, Carl Banks wrote: > > > > > On Jan 20, 7:08 am, Bob Kline<bkl...@rksystems.com> wrote: > >> I just noticed that the following passage in RFC 822: > > >> The process of moving from this folded multiple-line > >> representation of a header field to its single line represen- > >> tation is called "unfolding". Unfolding is accomplished by > >> regarding CRLF immediately followed by a LWSP-char as > >> equivalent to the LWSP-char. > > >> is not being honored by the email module. The following two invocations > >> of message_from_string() should return the same value, but that's not > >> what happens: > > >> >>> import email > >> >>> email.message_from_string("Subject: blah").get('SUBJECT') > >> 'blah' > >> >>> email.message_from_string("Subject:\n blah").get('SUBJECT') > >> ' blah' > > >> Note the space in front of the second value returned, but missing from > >> the first. Can someone convince me that this is not a bug? > > That's correct, according to my reading of RFC 822 (I doubt it's > > changed so I didn't bother to look up what the latest RFC on that > > subject is.) > > > The RFC says that in a folded line the whitespace on the following > > line is considered a part of the line. > > Thanks for responding. I think your interpretation of the RFC is the > same is mine. What I'm saying is that by not returning the same value > in the two cases above the module is not "regarding CRLF immediately > followed by a LWSP-char as equivalent to the LWSP-char."
That makes sense. The space after \n is part of the reconstructed subject and the email module should have treated it same as if the line hadn't been folded. I agree that it's a bug. The line-folding needs to be moved earlier in the parse process. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list