Much appreciated Julian! On Jul 12, 2012, at 1:31 AM, Julian Reschke wrote:
> On 2012-07-09 17:01, Julian Reschke wrote: >> On 2012-07-09 16:48, Mike Jones wrote: >>> HTML5 is not cited because it's a working draft - not an approved >>> standard. In what way is "the definition of the media type in HTML4 >>> is known to be insufficient"? People have been successfully >>> implementing form-urlencoding with it for quite some time. :-) Is >>> there a specific wording change that you'd suggest that we make that >>> doesn't involve citing a working draft, rather than an approved standard? >> >> For instance, the HTML4 "definition" doesn't even mention what to do >> with non-ASCII characters. >> >> I understand that it's not particularly attractive, but citing HTML4 >> just because it's a "standard" isn't really helpful for people who >> actually follow the link and try to understand what needs to be >> implemented. >> ... > > Here's an attempt to describe the encoding in terms of HTML4, plus additional > instruction. This would need to be referenced anyway where the spec currently > refers to the HTML4 media type definition: > > -- snip -- > Appendix X. Use of the application/x-www-form-urlencoded Media Type > > At the time of publication of this specification, the > "application/x-www-form-urlencoded" media type was defined in Section 17.13.4 > of [HTML4], but not registered in the IANA media types registry > (<http://www.iana.org/assignments/media-types/index.html>). Furthermore, the > definition is incomplete as it does not consider non-US-ASCII characters. > > To address this shortcoming, when generating payloads using this media type, > names and values MUST be encoded using the "UTF-8" character encoding scheme > ([RFC3629]) first; the resulting octet sequence then needs to be further > encoded using the escaping rules defined in [HTML4]. > > When parsing data from a payload using this media type, the names and values > resulting from reversing the name/value encoding consequently need to be > treated as octet sequences, to be decoded using the "UTF-8" character > encoding scheme. > > Example: A value consisting of the six Unicode code points (1) U+0020 > (SPACE), (2) U+0025 (PERCENT SIGN), (3) U+0026 (AMPERSAND), (4) U+002B (PLUS > SIGN), (5) U+00A3 (POUND SIGN), and (6) U+20AC (EURO SIGN) would be encoded > into the octet sequence below (using hexadecimal notation): > > 20 25 26 2B C2 A3 E2 82 AC > > and then represented in the payload as: > > +%25%26%2B%C2%A3%E2%82%AC > > -- snip -- > > Best regards, Julian > _______________________________________________ > OAuth mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/oauth _______________________________________________ OAuth mailing list [email protected] https://www.ietf.org/mailman/listinfo/oauth
