> On 8/17/07, Bill Janssen <[EMAIL PROTECTED]> wrote: > > > Ideally, the package would be well suited not only for wire-to-wire > > > and all-internal uses, but also related domains like HTTP and other > > > RFC 2822-like contexts. > > > > But that's exactly why the internal representation should be bytes, > > not strings. HTTP's use of MIME, for instance, uses "binary" quite a > > lot. > > In the specific case of HTTP, it certainly looks like the headers are > represented on the wire as 7-bit ASCII and could be treated as bytes > or strings by the header processing code it uses via rfc822.py. The > actual body of the response should still be represented as bytes, > which can be converted to strings by the application.
Note that, in the case of HTTP, both the request message and the response message may contain MIME-tagged binary data. And some of the header values for those message types may contain arbitrary RFC-8859-1 octets, not necessarily encoded. See sections 4.2 and 2.2 of RFC 2616. But we're not really interested in those message headers -- that's a consideration for the HTTP libraries. I'm just concerned about the MIME standard, which both HTTP and email use, though in different ways. The MIME processing in the "email" module must follow the MIME spec, RFC 2045, 2046, etc., rather than assume RFC 2821 (SMTP) and RFC 2822 encoding everywhere. SMTP is only one form of message envelope. The important thing is that we understand that raw mail messages -- say in MH format in a file -- do not consist of "lines" of "text"; they are complicated binary data structures, often largely composed of pieces of text encoded in very specific ways. As such, the raw message *must* be treated as a sequence of bytes. And the content of any body part may also be an arbitrary sequence of bytes (which, in an RFC 2822 context, must be encoded into ASCII octets). The values of any header may be an arbitrary string in an arbitrary language in an arbitrary character set (see RFCs 2047 and 2231), though it must be put into the message appropriately encoded as a sequence of octets which must be drawn from a set of octets which happens to be a subset of the octets in ASCII. Maybe all of this argues for separating "mime" and "email" into two different packages. And maybe renaming "email" "internet-email" or "rfc2822-email". Bill _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
