OK, case closed, and it had nothing to do with XMLBeans.

There were three steps needed to make UTF-8 work all the way:

- Jsp: add <[EMAIL PROTECTED] language="java" pageEncoding="utf8"
contentType="text/html; charset=UTF8"%> and in <header>:  <meta
http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

- Tomcat Catalina.bat: add set JAVA_OPTS=-Dfile.encoding=UTF-8 to
interpret data from the db correctly.

- Web.xml: add a filter as in
http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n.html to make
POSTs send correctly encoded characters.

thanks for your help!
Christian

> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED] 
> Sent: 29. lokakuuta 2005 1:50
> To: [email protected]
> Subject: RE: character sets
> 
> 
> That goes beyond my level of understanding of JSP, does 
> anyone have experience with I18N in JSPs?
> 
> Thanks,
> Radu
> 
> -----Original Message-----
> From: Wendell, Christian [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 26, 2005 2:53 AM
> To: [email protected]
> Subject: RE: character sets
> 
> 
> Hmm, I changed my angle to go for UTF-8 all the way. So my 
> jsps now have
> 
> 
> <%@ page contentType="text/html;charset=utf-8" %>
> ...
> <?xml version="1.0" encoding="utf-8"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 
> Transitional//EN" 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
> <html xmlns="http://www.w3.org/1999/xhtml";>
>   <head>
>     <meta http-equiv="Content-Type" content="text/html; 
> charset=UTF-8"/> ... in the beginning. Now input is OK, i.e. 
> user's input is correctly inserted into the db. Now static 
> Scandinavian display alright. But db data is mangled: The 
> Action gets a bean populated from the db (Unicode). Logging 
> the bean, I can see that the fields contain UTF-8. Getters in 
> the Action return UTF-8. 
> 
> But when I put the bean into the session, and using getters 
> in JSP, display the data on the page, the Scandinavian 
> characters are all wrong. It seems like the jsp tries to 
> convert from ANSI (ISO-8859-1) to UTF-8!? Why doesn't it 
> realize that the data is already UTF-8 and just display it as such? 
> 
> thanks so far,
> Christian
> 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED]
> > Sent: 26. lokakuuta 2005 0:03
> > To: [email protected]
> > Subject: RE: character sets
> > 
> > 
> > Also, Christian, could you try using "ISO8859_1" rather than 
> > "ISO-8859-1" for the encoding string in your code? XmlBeans 
> is using 
> > the Java names for the encodings, which seemed more consistent than 
> > using the IANA names (so you can get the encoding from 
> other Java code 
> > and set it directly etc). Of course, the generated documents always 
> > use the IANA names.
> > 
> > Let us know how that works,
> > Radu
> > 
> > PS Regarding the use of setSaveSubstituteCharacters(), as the name 
> > indicates, this is a save-time XmlOption, so it will not have any 
> > effect when passed as argument to newInstance(), but when passed to 
> > xmlText() or other similar methods. The reason for this is that the 
> > infoset doesn't make any difference between a character being 
> > represented as entity or as literal value.
> > 
> > -----Original Message-----
> > From: Steve Davis [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, October 19, 2005 7:12 AM
> > To: [email protected]
> > Subject: RE: character sets
> > 
> > 
> > I am not an expert in this area, but the following code may help:
> > 
> >   /**
> >    * Gets the formatted character-encoded string 
> representation of an 
> > XmlTokenSource.
> >    * @param xmlTokenSource - typically an XmlBean object
> >    * @param encoding - the desired character encoding
> >    * @return String ready for transmission
> >    * @throws Exception
> >    */
> >   public static String getEncodedXmlText(XmlTokenSource
> > xmlTokenSource, String encoding)
> >       throws Exception
> >   {
> >     // Setup various properties of the XML instance document
> >     xmlTokenSource.documentProperties().setEncoding(encoding);
> >     xmlTokenSource.documentProperties().setVersion("1.0");
> >     XmlOptions xmlOptions = new XmlOptions();
> >     xmlOptions.setCharacterEncoding(encoding);
> >     xmlOptions.setUseDefaultNamespace();
> >     xmlOptions.setSaveAggressiveNamespaces();
> > 
> >     // Format to a buffer and read it back into a string
> >     ByteArrayOutputStream bos = new ByteArrayOutputStream();
> >     xmlTokenSource.save(bos, xmlOptions);
> >     return bos.toString();
> >   }
> > 
> > 
> > -----Original Message-----
> > From: Wendell, Christian [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, October 19, 2005 4:31 AM
> > To: [email protected]
> > Cc: Kekkonen, Jari
> > Subject: character sets
> > 
> > 
> > Hi
> > 
> > We have a Struts/tomcat solution, the jsp files using the 8859-1 
> > character set, on top of an Oracle db, which uses UTF-8/Unicode. Of 
> > course, in Scandinavia, we have Scandinavian characters. 
> The problem 
> > is putting stuff from the html input boxes into the db, and reading 
> > stuff from the db to the page, so that the umlaut chars work.
> > 
> > The bean models we use are generated from schemas with 
> XMLBean tools. 
> > In the action that reads stuff from the page and sends it 
> to the db, 
> > we try to put the encoding into the XML header, but fail:
> > 
> >   XmlOptions opts= new XmlOptions();
> >   /* we'd also like to encode '>', using the newest devel version, 
> > which also has no effect:
> >     XmlOptionCharEscapeMap escapes= new XmlOptionCharEscapeMap();
> >     escapes.addMapping('>', XmlOptionCharEscapeMap.PREDEF_ENTITY);
> >     opts.setSaveSubstituteCharacters(escapes);
> >   */
> >   opts.setCharacterEncoding("ISO-8859-1");
> >   Note addedNote= Note.Factory.newInstance(opts);   //The bean
> >   addedNote.setNote(text);  //text contains umlaut characters from
> > the jsp
> > 
> > Logging the xml structure, we can't see any difference in the 
> > generated xml whether we do setCharacterEncoding() or not.
> > 
> > Is our strategy right but implementation wrong, or should 
> we do this 
> > somehow else?
> > 
> > Christian
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > 
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to