"L.P." wrote:
> I thought the charset selection might be avoided at all if one could
> use "UTF-8" as the single, universal encoding. Browsers that support
> UTF-8 -- like recent versions of NS and IE -- should encode in UTF-8
> and send back whatever the user typed in a textbox, if the encoding is
> properly requested. So I tried the same technique with UTF-8, under
> NT4.0 Server, JSDK2.0, Netscape 4.7. I created an HTML file encoded
> in UTF-8 with Java, containing a form with a textbox parameter value
> of 3 characters. Comparing the bytes received from the browser with the
> expected UTF-8 format bytes I got this result:
> 1 byte mapping to 1 char c<=127 is sent properly
> 2 bytes mapping to 1 char 128<=c<=255 are sent properly
> 2 bytes mapping to 1 char c>255 are replaced by \u003F (QUESTION MARK)
>
> Similar results (i.e. bad encoding of characters beyond \xFF) I obtained
> with Amaya.
>
> I suppose there must be common, accepted ways to handle this kind
> of problem. I would consider applet-based solutions as the last resort.
>
> I would appreciate any comment/suggestion.
> Thanks,
I tried characters 128-512 with this servlet:
-------------------------------------------------------------------------------------------
import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
public class Utf8try extends HttpServlet
{
public static final int FROMCHAR = 128;
public static final int TOCHAR = 512;
public void doGet (HttpServletRequest request, HttpServletResponse
response)
throws ServletException, IOException
{
PrintWriter out;
response.setContentType("text/html; charset=UTF-8");
out = response.getWriter();
try {
out.println("<HTML><BODY><FORM METHOD=POST>");
out.println("<INPUT TYPE=SUBMIT><BR>");
for(int i = FROMCHAR; i<TOCHAR;i++) {
out.println(i+"<INPUT TYPE=TEXT SIZE=2 NAME=\"char"+i+"\"
VALUE=\""+((char)i) +"\">");
if(i%16==0) out.println("<BR>");
}
out.println("</FORM></BODY></HTML>");
} catch (Exception ex)
{out.println(ex.getMessage());ex.printStackTrace(out);}
}
public void doPost (HttpServletRequest request, HttpServletResponse
response)
throws ServletException, IOException
{
PrintWriter out;
response.setContentType("text/html; charset=UTF-8");
out = response.getWriter();
try {
out.println("<HTML><BODY>");
for(int i = FROMCHAR; i<TOCHAR;i++) {
String s1 = request.getParameter("char"+i);
String s2 = new String(s1.getBytes("ISO-8859-1"),"UTF-8");
if(s2.charAt(0) != (char)i)
out.println("char "+i+" is wrong: "+((char)i)+"/"+s2+"<BR>");
}
out.println("</BODY></HTML>");
} catch (Exception ex)
{out.println(ex.getMessage());ex.printStackTrace(out);}
}
}
-------------------------------------------------------------------------------------------
I tried it with Apache JServ 1.0 as the servlet engine and
Netscape 4.7 and MSIE 5.0 (for WindowsNT) as browsers.
In MSIE 5.0 all characters were good. In Netscape 4.7 most of
characters were wrong. So it looks like a browser problem.
Martin
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INET, a.s. Mgr. Martin Kuba
Kralovopolska 139 e-mail: [EMAIL PROTECTED]
601 12 Brno WWW: http://www.inet.cz/~makub/
Czech Republic tel: +420-5-41242414/32
--------------------------------------------------------------------
___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".
Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html