There is a bug in most browsers where, even if you set the content type
in the form to
utf-8 using:

<form name="menu" action="" method="POST" accept-charset="UTF-8">

The browser still does not set the content-type properly when it sends
data back to the
server.  You will probably find that if you call

ServletRequest.getContentType()

it will return iso8851-1 or Cp1250.  However, if the browser has it's
encoding set to utf-8 and you
are using a utf-8 input method, such as by selecting utf-8 on a CJK
input tool, the browser will actually
send back utf-8 characters.  I noticed that you tried new String(byte[],
"UTF-8") and
said that it doesn't work.  You might want to alter this as in the
following, first examining the request
encoding and then getting the bytes:

    private static String transformEncoding(HttpServletRequest request,
String raw) {
        String encoding = request.getCharacterEncoding();
        String transformed = null;
        if (raw != null) {
            try {
                byte[] bytes = raw.getBytes(encoding);
                transformed = new String(bytes, UTF8);
            } catch(UnsupportedEncodingException e) {
                . . .
            }
        }
        return transformed;
    }

I have had luck with this in the past but I have to admit that I am not
out of the forrest with my utf-8
problems.

Alex Amies

-----Original Message-----
From: Mark Galbreath [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 26, 2001 6:46 AM
To: [EMAIL PROTECTED]
Subject: Re: character encoding problem


Setting the request object's charset is supported in API 2.2 (and
earlier)
if you import the com.oreilly.servlet.ParameterParser class.

Also, are you sure you are using "UTF-8" and not "UTF8?"  You last
sentence
makes this questionable.  I know the alias for UTF-8 in Java 1.1.5 and
earlier was "UTF8" and most browsers choked on the malformed content
type.
The work-around for this now is:

res.setContentType( "text/html; charset=UTF-8");
PrintWriter out = new PrintWriter(
    new OutputStreamWriter( res.getOutputStream(), "UTF8"), true);

Finally, do you need to set the locale?
    Locale locale = new Locale( "en", "US");
for English.


Cheers!
Mark

----- Original Message -----
From: "Tomas Zeman" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 26, 2001 8:54 AM
Subject: character encoding problem


> Hi all,
>
> I am still having trouble with character encoding in servlets.
>
> I want to convert all data from getParameter("form_param") to UTF-8
>
> I have this servlets
> -------------------------------------------------
> import java.io.*;
> import java.util.*;
> import javax.servlet.http.*;
> import javax.servlet.*;
>
> public class HelloServlet extends HttpServlet {
>   public void doGet (HttpServletRequest req,HttpServletResponse res)
throws
> ServletException, IOException
>   {
>
>     // this line desn't work it needs servlet 2.3!
>     file://req.setCharacterEncoding("UTF-8");
>
>     res.setContentType("text/html;charset=UTF-8;");
>     PrintWriter pw = res.getWriter();
>
>     String par = req.getParameter("text");
>
>     // What to write here to convert the par String ?? (from
iso-8859-2
and
> Cp1250)
>     // TODO
>     String convertedPar = new String(par.getBytes(),"UTF-8"); // but
it
> doesn't work
>
>     pw.println("<head><meta http-equiv='Content-Type'
> content='text/html;charset=UTF-8;'></head>");
>
>     pw.println("Hi");
>     pw.println("<form method=\"POST\"><textarea cols='50' rows='8'
> name='text'></textarea><br><input type='Submit'></form>");
>     pw.println("<hr> Parameter : " + par);
>     pw.println("<hr> ConvertedParameter : " + convertedPar);
>
>     pw.close();
>
> /*
> // this code will write my parameter to the file in good encoding, but
I
> need to have par string converted
> // to UTF-8 before that to display it on the page
>
>     try {
>         FileOutputStream fos = new FileOutputStream("/tmp/1.1");
>         Writer out = new OutputStreamWriter(fos , "UTF-8");
>         out.write(par);
>         out.flush();
>         out.close();
>     } catch (IOException e) {
>         e.printStackTrace();
>     }
> */
>
>   }
>
>   public void doPost (HttpServletRequest req,HttpServletResponse res)
throws
> ServletException, IOException
>   {
>    doGet(req,res);
>   }
> }
>
> ------------------------------------------------
>
> Could anybody help me, what code add to this servlet to convert all
> characters properly ?
> (I am looking for toUTF8(String s) function)
>
> Thanks a lot
>
> Tomas Zeman
> email: [EMAIL PROTECTED]
>
>
________________________________________________________________________
___
> To unsubscribe, send email to [EMAIL PROTECTED] and include in the
body
> of the message "signoff SERVLET-INTEREST".
>
> Archives: http://archives.java.sun.com/archives/servlet-interest.html
> Resources:
http://java.sun.com/products/servlet/external-resources.html
> LISTSERV Help: http://www.lsoft.com/manuals/user/user.html
>

________________________________________________________________________
___
To unsubscribe, send email to [EMAIL PROTECTED] and include in the
body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".

Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html

Reply via email to