On Tue, Jul 29, 2008 at 4:04 PM, James Y Knight <[EMAIL PROTECTED]> wrote: >> So it is MIME, right? > > No: RFC 2388 says it is MIME, but in real life it is not. RFC 2388 is wrong.
I think this is a problem of semantics; what you mean by "wrong". The RFC is not wrong, in terms of it having a technical inaccuracy or needing a errata. Which by the way none have been issued so far, http://www.rfc-editor.org/errata_search.php?rfc=2388 It may be "wrong" only in terms of it being ignored by the authors of software. I'd tend to use a different less-misleading term though. I think it more appropriate to call the software which purports to adhere to HTTP 1.1 (and hence it's dependent specs like RFC 2388) to be "wrong". >> Now you can successfully argue that many user agents do not >> follow the RFC carefully enough. But that's not a problem with >> the RFC itself. > > Common practice is by now long established, and cannot simply be changed 10 > years after the fact to conform to what the standard says it should've been. I'm not so sure. Granted this is a problem for the browser guys and not us Python people. Ragarding timelines; the multipart/form-data RFC 2388 was written in 1988. The HTTP 1.1 came after that. And both of these specs are around 10 years old, while most browsers today are in fact the newcomers; not the other way around. The RFC isn't trying to rewrite facts; it came first. I'm sure there's lots of other places where browsers today do not adhere to the RFC specs; so do we say the specs are wrong or that the browsers have bugs? (I'm not talking about W3C stuff; that's clearly not as straight forward as RFCs) > Therefore, it *is* now a problem with the standard: the standard is wrong. > If you follow it, you're going to create totally broken software. I don't think we're there. Although many real world browsers may not conform strictly to the RFC; I fail to see why that means that the server can't be in this case. I just don't see "totally broken" as an inevitable outcome. > For instance, treating form posts as being 7bit unless they have a > Content-Transfer-Encoding. The RFC says you should do that. Um, no. HTTP 1.1 specifically grants an exemption to that 7-bit restriction in MIME. The tricky part is that with web software you're dealing with a whole bunch of standards, and even ignoring W3C stuff, there's even a whole bunch of RFCs. Sometimes one RFC will override part of another; and that's what the HTTP RFC does to the MIME RFCs. Yes, its confusing and prone to interpretation errors. > But it's an > absolutely nonsensical thing to do. Your code would not work with any > existing web browser if you did. Or, if you're writing a web browser: don't > even think of using Content-Transfer-Encoding to encode your response. Again, the RFCs already account for that. In web software, the primary RFC is the HTTP 1.1 spec; not the MIME spec. This can be confusing because HTTP borrows say 90% of MIME, but overrides other parts of it. So I guess in a pedantic way, yes, this is not strictly "MIME". If it were you'd be dealing with email, not web. But in as much as its the parts of MIME that the HTTP spec says to use, it is still MIME. And the parts we're dealing with; the multipart/form-data type and what to do with the presence or absence of content-type headers on the subparts; well, that is pretty explicitly stated. >> Or it should be considered encoded "byte" string? > > I'd recommend that it should be, certainly at the lower levels. A higher > level API can look at the hints available to figure out how to decode the > non-file fields: e.g.: if the magic _charset_ parameter is present, use > that, otherwise use what the developer tells you they put in accept-charset > / what encoding they sent the page in. I don't think any library should be applying those heuristics. Hasn't everybody been annoyed by IE's content type sniffing heuristics; this would be the same idea but on the server side. Heuristics though may be a perfectly suitable thing for some applications to do. But you also have to remember that not all HTTP transactions involve browsers, or even HTML, and that deviations from the RFC should have explicit consequences in those cases in terms of a standard library. I think that perhaps allowing the application to provide an override (default content type) as input might be enough in this case; although even that could be argued. It might be sufficient that the library follow the RFC strictly; and well, if the posted data doesn't follow the spec we raise an error along with the original byte string and let the application deal with it. An override is I think a reasonable compromise to allow one to deal with real-world non-conforming browsers; while not throwing out the RFC or adding complex fragile heuristics into the library. You certainly don't want to break when/if you get a user agent that DOES follow the RFC. -- Deron Meranda _______________________________________________ Web-SIG mailing list Web-SIG@python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/archive%40mail-archive.com