Much appreciated Julian!

On Jul 12, 2012, at 1:31 AM, Julian Reschke wrote:

> On 2012-07-09 17:01, Julian Reschke wrote:
>> On 2012-07-09 16:48, Mike Jones wrote:
>>> HTML5 is not cited because it's a working draft - not an approved
>>> standard.  In what way is "the definition of the media type in HTML4
>>> is known to be insufficient"?  People have been successfully
>>> implementing form-urlencoding with it for quite some time. :-)  Is
>>> there a specific wording change that you'd suggest that we make that
>>> doesn't involve citing a working draft, rather than an approved standard?
>> 
>> For instance, the HTML4 "definition" doesn't even mention what to do
>> with non-ASCII characters.
>> 
>> I understand that it's not particularly attractive, but citing HTML4
>> just because it's a "standard" isn't really helpful for people who
>> actually follow the link and try to understand what needs to be
>> implemented.
>> ...
> 
> Here's an attempt to describe the encoding in terms of HTML4, plus additional 
> instruction. This would need to be referenced anyway where the spec currently 
> refers to the HTML4 media type definition:
> 
> -- snip --
> Appendix X. Use of the application/x-www-form-urlencoded Media Type
> 
> At the time of publication of this specification, the 
> "application/x-www-form-urlencoded" media type was defined in Section 17.13.4 
> of [HTML4], but not registered in the IANA media types registry 
> (<http://www.iana.org/assignments/media-types/index.html>). Furthermore, the 
> definition is incomplete as it does not consider non-US-ASCII characters.
> 
> To address this shortcoming, when generating payloads using this media type, 
> names and values MUST be encoded using the "UTF-8" character encoding scheme 
> ([RFC3629]) first; the resulting octet sequence then needs to be further 
> encoded using the escaping rules defined in [HTML4].
> 
> When parsing data from a payload using this media type, the names and values 
> resulting from reversing the name/value encoding consequently need to be 
> treated as octet sequences, to be decoded using the "UTF-8" character 
> encoding scheme.
> 
> Example: A value consisting of the six Unicode code points (1) U+0020 
> (SPACE), (2) U+0025 (PERCENT SIGN), (3) U+0026 (AMPERSAND), (4) U+002B (PLUS 
> SIGN), (5) U+00A3 (POUND SIGN), and (6) U+20AC (EURO SIGN) would be encoded 
> into the octet sequence below (using hexadecimal notation):
> 
>  20 25 26 2B C2 A3 E2 82 AC
> 
> and then represented in the payload as:
> 
>  +%25%26%2B%C2%A3%E2%82%AC
> 
> -- snip --
> 
> Best regards, Julian
> _______________________________________________
> OAuth mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/oauth

_______________________________________________
OAuth mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/oauth

Reply via email to