Don't bother fiddling with <FORM> attributes. I've done this before to no avail.

Right now, no matter what you specify as an encoding in a HTML page, most browsers 
(all favorite IE and NN flavors) ignore it altogether and encode the form data using 
the encoding in which the page containing the form was sent to them. Worse yet, they 
*don't* specify the encoding of characters in the form data when sending them back via 
a POST request, so you must know on the server side what was the encoding of the page 
that contained the form. Servlet 2.3 spec is meant to contain a solution for this, but 
I don't know how is it (or isn't) implemented in Tomcat 4.x.

As if all of the above weren't enough, Tomcat 3.x gives yet another stab to 
internationalization efforts: it will blindly interpret all form data as being 
iso-8859-1 (~ Cp1252), so your iso-8859-2 (~Cp1250) characters are lost. Again, I 
don't know how Tomcat 4.x line handles this. 

Being a Hungarian, I'm just as interested in entering 8859-2 characters in my pages, 
and not seeing ? marks on the server side, so I'm transcoding all form data strings on 
the fly. The off-the-wall solution looks like this:

param = new String(param.getBytes("8859_1"), "8859_2");

altough this tends to be slow (running through Java char-to-byte, then through 
byte-to-char machinery). I have developed a fast 8859-1 to 8859-2 transcoder that 
addresses speed issues; contact me in private mail and I can send it to you.

Cheers,
  Attila.
--
Attila Szegedi
home: http://www.szegedi.org

----- Original Message ----- 
From: "Nikola Milutinovic" <[EMAIL PROTECTED]>
To: "Tomcat Users List" <[EMAIL PROTECTED]>
Sent: 2002. február 18. 15:17
Subject: Re: Input from a FORM - encoding problem


> > <quote>
> > FORM attribute
> > 
> > accept-charset = charset list [CI]
> >     This attribute specifies the list of character encodings for input data that 
>is accepted by the server processing this form. The value is a space- and/or 
>comma-delimited list of charset values. The client must interpret this list as an
> > exclusive-or list, i.e., the server is able to accept any single character 
>encoding per entity received.
> 
> This bit is a "bit unclear" to me. If I specify several encodings, how will the 
>browser know which one was actually used? How will the server know which one was used?
> 
> Nix.
> 

Attachment: smime.p7s
Description: application/pkcs7-signature

Reply via email to