"Stephen R. van den Berg" <[email protected]> wrote: > Perform proper characterset decoding for multipart/form-data.
This looks like a beehive. :( Clearly, Roxen doesn't handle charsets correctly. Worse is that at least Firefox 3 apparently doesn't handle it correctly either: According to rfc 2388 and html 4.01 each part should have a Content-Type with a charset, but Firefox doesn't provide any, even if it sends the data in utf-8. Furthermore, it doesn't always use utf-8. E.g. if the <form> tag has an accept-charset attribute with something else then Firefox uses that charset, without any hint of it in the response. The rfc says clearly that if a Content-Type header isn't present then it should default to text/plain. And the default charset if a charset parameter isn't present in a content type must be us-ascii according to the mime rfc (2046, section 4.1.2). So to begin with the fix should obey the charset provided with each form-data part, if there is any. But what's a good way to cope with the broken Firefox behavior? Your patch uses the same approach as url-encoded variables, complete with the roxen automatic charset variable hack. The problem is that that's in direct violation of the standards. :( I guess the next step is to see how other browsers behave, and see if the Mozilla folks has something to say about why they still don't implement a 10 year old standard correctly. The fun doesn't stop with the charsets, for that matter. The rfc and the html standard says that multiple file responses should be encoded as multipart/mixed within multipart/form-data. Firefox doesn't do that either, instead it just sends more multipart/form-data for the same form name. Anyway, in that case it's not difficult for Roxen to support both (it already handles the broken Firefox way, but not the correct way). Footnote: This is what my FF 3.0.7 sends in my little test case. Everything is utf-8 encoded, but there is no content charset spec anywhere. "POST /test/charset.html HTTP/1.1\r\n" "Host: localhost:14741\r\n" "User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7) Gecko/2009030423 Ubuntu/8.10 (intrepid) Firefox/3.0.7\r\n" "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n" "Accept-Encoding: gzip,deflate\r\n" "Accept-Charset: UTF-8,*\r\n" "Keep-Alive: 300\r\n" "Connection: keep-alive\r\n" "Referer: http://localhost:14741/test/charset.html\r\n" "Cookie: RoxenUserID=2db20b201ff3d0aa16377e692847bce1\r\n" "Content-Type: multipart/form-data; boundary=---------------------------19179964687172196451241852838\r\n" "Content-Length: 994\r\n" "\r\n" "-----------------------------19179964687172196451241852838\r\n" "Content-Disposition: form-data; name=\"barf\"\r\n" "\r\n" "\303\245\303\244\303\266\r\n" "-----------------------------19179964687172196451241852838\r\n" "Content-Disposition: form-data; name=\"m\303\266h\303\244tta\"\r\n" "\r\n" "\303\245\303\244\303\266\r\n" "-----------------------------19179964687172196451241852838\r\n" "Content-Disposition: form-data; name=\"ok\"\r\n" "\r\n" "Submit Query\r\n" "-----------------------------19179964687172196451241852838\r\n" "Content-Disposition: form-data; name=\"file\"; filename=\"foo.pike\"\r\n" "Content-Type: application/octet-stream\r\n" "\r\n" "int main()\n" "{\n" " multiset m = (<1, 2, 3>);\n" " foreach (m; mixed y;) {\n" " foreach (m; mixed x;) {\n" " werror (\"del %O\\n\", x);\n" " m[x] = 0;\n" " }\n" " werror (\"%O\\n\", y);\n" " }\n" "}\n" "\r\n" "-----------------------------19179964687172196451241852838\r\n" "Content-Disposition: form-data; name=\"file\"; filename=\"bar.pike\"\r\n" "Content-Type: application/octet-stream\r\n" "\r\n" "void x () {werror (\".\\n\");}\n" "\n" "int main()\n" "{\n" " mixed p = x();\n" "}\n" "\r\n" "-----------------------------19179964687172196451241852838--\r\n"
