Gregor Schneider wrote:
If found this one:
http://www.w3.org/TR/html401/interact/forms.html#adef-accept-charset
Actually, to me it's not clear why Tomcat should believe the input
being encoded in ISO8859-1, when one can give a detailled information
how the form-data is encoded.
If I understand it correctly, one can even *force* any client (as long
as the client is following the specs) to encode the form-data using
the "accepeted-charset"-attribute of the <Form>-element.
IOW:
Setting "accepted-charset="UTF8"" should solve the problems.
Comments, anyone?
Yes.
But no, it does not seem to work.
I was under the same impression as you indicate above, and I already
knew about the <form accept-charset=..>
But I just tested this in Firefox 2 and in IE 6, and it does not work as
expected.
This is my test :
1) I created a html page as follows :
-- begin --
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form name="f1" action="http://mira.wissensbank.com/pcgi/printenv.pl"
method="POST"
enctype="multipart/form-data" accept-charset="UTF-8">
First param: <input name="param1" type="text" value="andré"><br/>
Second param: <input name="param2" type="text" value="gregör"><br/>
<input name="go" type="submit" value="GO"><br/>
</form>
</body>
</html>
-- end --
The above file is created with a UTF-8 aware editor, and the characters
in it (in "andré" and "gregör")(the umlaut is mine, as a test), are
encoded as UTF-8. I saved the file as UTF-8 without BOM. As you can
see, the document contains a <meta> tag indicating the page encoding,
and the form contains an "accept-charset" attribute of the same color.
2) I opened this file in Firefox 2.0 and clicked the GO button.
Since I open this as a local file, there is no "Content-Type" header
coming from the server to confuse things.
In Firefox, I have the LiveHttpHeaders plugin installed, which allows me
to see the request as sent to the server, and save a copy of it, which I
did. This is the result :
-- begin --
POST /pcgi/printenv.pl HTTP/1.1
Host: mira.wissensbank.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.15)
Gecko/20080623 Firefox/2.0.0.15
Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-gb,en;q=0.7,de-de;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Content-Type: multipart/form-data;
boundary=---------------------------218302158314236
Content-Length: 350
-----------------------------218302158314236
Content-Disposition: form-data; name="param1"
andré
-----------------------------218302158314236
Content-Disposition: form-data; name="param2"
gregör
-----------------------------218302158314236
Content-Disposition: form-data; name="go"
GO
-----------------------------218302158314236--
-- end --
3) I did the same in Internet Explorer 6.0, which has another plugin of
similar functionality (Fiddler), with which I can capture the whole request.
Here it is :
-- begin --
POST /pcgi/printenv.pl HTTP/1.1
Accept: */*
Accept-Language: de
Content-Type: multipart/form-data;
boundary=---------------------------7d98c5bb072c
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET
CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
Host: mira.wissensbank.com
Content-Length: 338
Connection: Keep-Alive
Pragma: no-cache
-----------------------------7d98c5bb072c
Content-Disposition: form-data; name="param1"
andré
-----------------------------7d98c5bb072c
Content-Disposition: form-data; name="param2"
gregör
-----------------------------7d98c5bb072c
Content-Disposition: form-data; name="go"
GO
-----------------------------7d98c5bb072c--
-- end --
So, as anyone can see, neither one of these browsers is adding any
charset information to the POST. Which I personally find very strange,
and rather on the bad side of the HTTP specs.
Which tends to confirm the note in SRV 4.9 of the Servlet Specs 2.4/2.5 :
"Currently, many browsers do not send a char encoding qualifier with the
Content-Type header, leaving open the determination of the character
encoding for reading HTTP requests."
Which also seems to contradict the HTML specs which you mention :
http://www.w3.org/TR/html401/interact/forms.html#h-17.13
and following paragraphs. (Note by the way the "Note" at the end of 17.13.1)
In particular, this one from section "17.13.4 Form content types" :
As with all multipart MIME types, each part has an optional
"Content-Type" header that defaults to "text/plain". User agents should
supply the "Content-Type" header, accompanied by a "charset" parameter.
Well, Firefox 2.0 and IE 6.0 don't supply a "Content-Type" and even less
a charset.
In the case of IE 6.0, I am not really surprised, but in the case of
Firefox, who would have thunk ?
Anyway, it kind of puts a spin on what I posted here before, in the
sense that the servlet engine thus, even in the case of a html form
which should have everything in it to leave no choice to the browser,
still does not get any information about the real character set of the
data sent by the browser.
Which personally, in our day and age, I find absolutely terrible.
I will now try to re-test this with Firefox 3 and IE 7.
Update : just tested with Firefox 3.1 beta, does not send Content-Type
nor charset either.
I am puzzled as to why.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org