Re: Struts 1.2.9 and UNICODE issue on form submittal

Adam Gordon Thu, 31 May 2007 10:32:52 -0700

Chris-

See answers below...


Christopher Schultz wrote:

I just ran it through Wireshark and followed the TCP stream verifying
 that it's being encoded correctly into 3 bytes - it is (E2 80 A2).


Is this the UTF-8 code that /should/ represent those bullet characters?

I would assume so. I had a co-worker use BBEdit on his mac to convert adocument to from UTF-8 from other formats and it appears to convert themback and forth correctly. However, the W3C recommends thatenctype=multipart/form-data be "used for submitting forms that containfiles, non-ASCII data, and binary data." However, setting this andleaving the acceptCharset and the @page directive as UTF-8 results inWireshark not reporting any encoding, the bullet character shows up as 3. characters.

Regardless, if we take the string that the 3 characters, E2 80 A2, arein, call charAt() for each character in the string, place all theresults in a byte array, and construct a new string from those bytes,Java correctly recognizes that these characters represent a UNICODEcharacter (I believe since String by default represents a UNICODEstring) and the string length decreases by 2.

I'm going to follow up with Paul's post and try it on 1.3.8 and see
if I can reproduce.  Basically, the behavior we're seeing is that the
3 bytes are being treated as separate characters and not as one
UNICODE character.


Can you confirm that the Content-Type of the form is being submitted
with the request properly (as an HTTP header) and that the Request
object on the server-side correctly reads the Content-Type header?

Yes, using "wget -S" prints the HTTP response headers and we can see theContent-Type header correctly set to UTF-8.


--adam

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Struts 1.2.9 and UNICODE issue on form submittal

Reply via email to