Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Deron Meranda
On Tue, Jul 29, 2008 at 4:04 PM, James Y Knight <[EMAIL PROTECTED]> wrote: >> So it is MIME, right? > > No: RFC 2388 says it is MIME, but in real life it is not. RFC 2388 is wrong. I think this is a problem of semantics; what you mean by "wrong". The RFC is not wrong, in terms of it having a tech

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> Common practice is by now long established, and cannot simply be > changed 10 years after the fact to conform to what the standard says > it should've been. Therefore, it *is* now a problem with the standard: > the standard is wrong. If you follow it, you're going to create > totally brok

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> Also I'd say that if you're dealing with text (text/*) and no > charset is provided (or the caller hasn't given an override > default charset); then you must assume US-ASCII. And > you should allow any UnicodeDecodeErrors to bubble > up to the caller. In other words if a user agent sent text >

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> I first try the content-type header, Right. > then the special _charset_ field, I don't know what that is. Can you explain a bit more? > and finally utf-8. That's wrong. Should be ASCII. You could add an "encoding" field to let the application override this, though. But the default is A

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> I would think it most useful if the decoding framework would strictly > follow the RFC and assume "text/plain; charset=US-ASCII"; but > also allow the caller some means of indicating a different default. > Obviously, if a user agent does provide a complete Content-Type, > it should be used. Yes,

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Deron Meranda
On Tue, Jul 29, 2008 at 3:50 PM, Manlio Perillo <[EMAIL PROTECTED]> wrote: > Deron Meranda ha scritto: >> >> [...] >>> >>> But, at this point, can one consider the content of form post to be >>> encoded >>> "text" string? >>> >>> Or it should be considered encoded "byte" string? >> >> Both/either.

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread James Y Knight
On Jul 29, 2008, at 3:18 PM, Deron Meranda wrote: In what way is RFC 2388 wrong or not MIME? Per RFC 2388 sect. 3: "The media-type multipart/form-data follows the rules of all multipart MIME data streams as outlined in [RFC 2046]." So it is MIME, right? No: RFC 2388 says it is MIME, b

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Manlio Perillo
Deron Meranda ha scritto: [...] But, at this point, can one consider the content of form post to be encoded "text" string? Or it should be considered encoded "byte" string? Both/either. I'd say follow the RFC, but perhaps allow a caller to provide an override default. So yes, you should ass

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Deron Meranda
On Tue, Jul 29, 2008 at 2:41 PM, Manlio Perillo <[EMAIL PROTECTED]> wrote: > James Y Knight ha scritto: >> You seem to be under the mistaken impression that form post content is >> MIME. It is not. It looks kinda like it should be, and maybe it's even >> specified to be [rfc2388], but actually trea

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread James Y Knight
On Jul 29, 2008, at 1:14 PM, Bill Janssen wrote: Ok with theory. But in practice: Seems like you're looking at a broken browser there. Can anyone point to where a W3C standard or IETF RFC describes this behavior? You seem to be under the mistaken impression that form post content is MIME.

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Deron Meranda
On Tue, Jul 29, 2008 at 12:39 PM, Manlio Perillo <[EMAIL PROTECTED]> wrote: > Bill Janssen ha scritto: >> Actually, it's defined for all fields, isn't it? From RFC 2388: >> >> ``As with all multipart MIME types, each part has an optional >> "Content-Type", which defaults to text/plain.'' >> >> So

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Manlio Perillo
James Y Knight ha scritto: On Jul 29, 2008, at 1:14 PM, Bill Janssen wrote: Ok with theory. But in practice: Seems like you're looking at a broken browser there. Can anyone point to where a W3C standard or IETF RFC describes this behavior? You seem to be under the mistaken impression that

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Manlio Perillo
Bill Janssen ha scritto: Ok with theory. But in practice: Seems like you're looking at a broken browser there. Right. It's Firefox. But it's the same with IE 6 and Opera. Can anyone point to where a W3C standard or IETF RFC describes this behavior? I think that it is safe to decode data

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> > Ok with theory. > > But in practice: > > Seems like you're looking at a broken browser there. Ah, I see that the Firefox people, at least, are aware that this is a bug in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=116346 But they haven't found a fix for it yet, because of the lar

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> Ok with theory. > But in practice: Seems like you're looking at a broken browser there. Can anyone point to where a W3C standard or IETF RFC describes this behavior? > I think that it is safe to decode data from the QUERY_STRING and POST=20 > data to Unicode, and to return Bad Request in case

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Manlio Perillo
Bill Janssen ha scritto: That's probably wrong. We went through this recently on the python-dev list. While it's possible to tell the encoding of multipart/form-data, With multipart/form-data the problem should be the same. The content type is defined only for file fields. Actually, it's de

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-29 Thread Bill Janssen
> > That's probably wrong. We went through this recently on the > > python-dev list. While it's possible to tell the encoding of > > multipart/form-data, > > With multipart/form-data the problem should be the same. > The content type is defined only for file fields. Actually, it's defined for

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Manlio Perillo
Bill Janssen ha scritto: In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset specified in the POST'ed data (utf-8 or the charset found in the special _charset_ field). That's probably wrong. We went through this recently on the python-dev list. While it's possible to tell the

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Bill Janssen
> In wsgix I use utf-8 for decoding the QUERY_STRING, and the charset > specified in the POST'ed data (utf-8 or the charset found in the special > _charset_ field). That's probably wrong. We went through this recently on the python-dev list. While it's possible to tell the encoding of multipar

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Bill Janssen
> The first parse the query string and return a dictionary of strings, the > latter parse the application/x-www-form-urlencoded client body and > return a dictionary of strings and the charset used by the client for > the unicode encoding. > Now, I'm thinking if these two function should instead r

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Manlio Perillo
Ian Bicking ha scritto: Manlio Perillo wrote: Hi. In my WSGI framework: http://hg.mperillo.ath.cx/wsgix I have, in the `http` module, the functions `parse_query_string` and `parse_simple_post_data`. The first parse the query string and return a dictionary of strings, the latter parse the appl

Re: [Web-SIG] parsing of urlencoded data and Unicode

2008-07-28 Thread Ian Bicking
Manlio Perillo wrote: Hi. In my WSGI framework: http://hg.mperillo.ath.cx/wsgix I have, in the `http` module, the functions `parse_query_string` and `parse_simple_post_data`. The first parse the query string and return a dictionary of strings, the latter parse the application/x-www-form-urlenc