-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Konstantin,

On 11/18/16 2:10 PM, Konstantin Kolinko wrote:
> One more authority, that I forgot to mention in my mail: IANA
> registry of mime types
> 
> Registry: 
> https://www.iana.org/assignments/media-types/media-types.xhtml
> 
> Registration entry for "application/x-www-form-urlencoded" 
> https://www.iana.org/assignments/media-types/application/x-www-form-ur
lencoded
>
>  -> Encoding considerations : 7bit
> 
> According to RFC defining this registry, it means that the data is 
> 7-bit ASCII only. https://tools.ietf.org/html/rfc6838#section-4.8

Oh, that's the nail in the coffin.

application/x-www-form-urlencoded from W3C says "if the character
doesn't fit into the encoding of the message, it must be %-encoded"
but it never says what "the encoding of the message" actually is. My
worry was that it was mutable, and that UTF-8 was a valid encoding,
meaning that 0xc2 0xae on the wire would have been acceptable (rather
than %C2%AE).

If application/a-www-form-urlencoded is *absolutely* supposed to be
7-bit ASCII, then nothing above 0x7f can ever be legally transferred
across the wire when using that content-type.

This solves André's problem with this content-type where he wanted to
specify the charset to be used. It seems the standard defines the
character set: US-ASCII.

The only problem now is that it's not clear how to turn %C2%AE into a
character because you have to know that UTF-8 and not Shift-JS or
whatever is being used.

> -> Required parameters : No parameters -> Optional parameters :  No
> parameters
> 
> OK. So no charset= parameter is allowed. My advise to specify the
> charset parameter was wrong.

Agreed: it is always against the spec(s) to specify a charset for any
MIME type that is not text/*.

> Though historically ~10 years ago I saw 
> "application/x-www-form-urlencoded;charset=UTF-8" Content-Type in
> the wild.

Oh, I'm sure you saw it. I even tossed that into my client to see if
it would make a difference. Not surprisingly, it did not.

> It was a web site authored in WML (Wireless Markup Language) and 
> accessed via WAP protocol by mobile phones.
> 
> (Specification reference for this WML/WAP usage: 
> http://technical.openmobilealliance.org/Technical/release_program/docs
/Browsing/V2_3-20070227-C/WAP-191-WML-20000219-a.pdf
>
>  Document title: WAP WML WAP-191-WML 19 February 2000
> 
> Wireless Application Protocol Wireless Markup Language
> Specification Version 1.3
> 
> -> Page 30 of 110 (in Section "9.5.1 The Go Element"): There is a
> table, where the following line is relevant:
> 
> Method: post Enctype: application/x-www-form-urlencoded Process:
> [...] The Content-Type header must include the charset parameter to
> indicate the character encoding.
> 
> I suspect that the above URL is not the official location of the 
> document. I found it through Googling. Official location should be
> http://www.wapforum.org/what/technical.htm )
> 
> 
> Apache Tomcat supports the use of charset parameter with
> Content-Type application/x-www-form-urlencoded in POST requests.

Interesting. I suspect that's because there are practical situations
where "being liberal with what you accept" is more appropriate than
angrily demanding that all clients be 100% spec-compliant :)

The (illegal) charset parameter can only mean one thing: the character
encoding to use to assemble url-decoded bytes into an actual string
value (e.g. %C2%AE -> 0xc2 0xae -> "®" when using UTF-8).

Thanks for that final reference; it really does close the case on this
whole thing.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJYL1YQAAoJEBzwKT+lPKRYyAkP/3Udkqjiqa7BhRH2Gxo8WhNf
Wm7BbWGS8vlgbHH/0mNzFPSxGi7mWxlimaGnc+H8fqk54RZCeNaqQPqPXhG7ldA1
QtR/1H1kXoqUNFmqnj3FBgA6UBZhql9RyLZLbeHdZMK9i1sN4bI/CEa2EP5rZ+0d
0sXXj8wRz+yk2bXtdyuW8yHzQRNB/+XJbOrQBVqc+u//K/+q9I8eEN0SlZo8+9t2
9hqqcufhd9YtuH1Ypn1M73l72WFWad7BEgPPG+noLcB8/OrSXfeF2ELEe9dzv6r6
Jyxas6uUiplE8+/1QTu8MYSGqeo3l/xgixCD9gEMLNFBlcLPlQcRhaoQ08bgZOcT
SyzVIYYCL7R7MsB1f3QFDEax0vwIi0a6Zrfaa3oqklXEhNuVk/Ani8+sbFw01iHW
ZxV6vc0v9APMOg3jVQug3UC1kAGcZi8toISKyrFt9lwK0AbDrSVKfe4sKql91yQm
wQCG3e/RjoSo1LEmh9yszurNtOy2ecqTBkIS2cksf4crYSqpefCyB/GpnrJaHMvx
P/PQ0hVZUg05Z/tj7Dxma5mWrlm9IQBC+inDiwIEnl9hGp67KfxZAEk8hUstDBWw
AK78+DsseGpyx40o6scDz8dR9ThnTHm3k0zhdUZoORwfft78Ar0HYjZCDQArhuMK
BDGqIegIrNeJtCDnYOdq
=nJCy
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to