OK, case closed, and it had nothing to do with XMLBeans. There were three steps needed to make UTF-8 work all the way:
- Jsp: add <[EMAIL PROTECTED] language="java" pageEncoding="utf8" contentType="text/html; charset=UTF8"%> and in <header>: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/> - Tomcat Catalina.bat: add set JAVA_OPTS=-Dfile.encoding=UTF-8 to interpret data from the db correctly. - Web.xml: add a filter as in http://www.javaworld.com/javaworld/jw-05-2004/jw-0524-i18n.html to make POSTs send correctly encoded characters. thanks for your help! Christian > -----Original Message----- > From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED] > Sent: 29. lokakuuta 2005 1:50 > To: [email protected] > Subject: RE: character sets > > > That goes beyond my level of understanding of JSP, does > anyone have experience with I18N in JSPs? > > Thanks, > Radu > > -----Original Message----- > From: Wendell, Christian [mailto:[EMAIL PROTECTED] > Sent: Wednesday, October 26, 2005 2:53 AM > To: [email protected] > Subject: RE: character sets > > > Hmm, I changed my angle to go for UTF-8 all the way. So my > jsps now have > > > <%@ page contentType="text/html;charset=utf-8" %> > ... > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 > Transitional//EN" > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml"> > <head> > <meta http-equiv="Content-Type" content="text/html; > charset=UTF-8"/> ... in the beginning. Now input is OK, i.e. > user's input is correctly inserted into the db. Now static > Scandinavian display alright. But db data is mangled: The > Action gets a bean populated from the db (Unicode). Logging > the bean, I can see that the fields contain UTF-8. Getters in > the Action return UTF-8. > > But when I put the bean into the session, and using getters > in JSP, display the data on the page, the Scandinavian > characters are all wrong. It seems like the jsp tries to > convert from ANSI (ISO-8859-1) to UTF-8!? Why doesn't it > realize that the data is already UTF-8 and just display it as such? > > thanks so far, > Christian > > > -----Original Message----- > > From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED] > > Sent: 26. lokakuuta 2005 0:03 > > To: [email protected] > > Subject: RE: character sets > > > > > > Also, Christian, could you try using "ISO8859_1" rather than > > "ISO-8859-1" for the encoding string in your code? XmlBeans > is using > > the Java names for the encodings, which seemed more consistent than > > using the IANA names (so you can get the encoding from > other Java code > > and set it directly etc). Of course, the generated documents always > > use the IANA names. > > > > Let us know how that works, > > Radu > > > > PS Regarding the use of setSaveSubstituteCharacters(), as the name > > indicates, this is a save-time XmlOption, so it will not have any > > effect when passed as argument to newInstance(), but when passed to > > xmlText() or other similar methods. The reason for this is that the > > infoset doesn't make any difference between a character being > > represented as entity or as literal value. > > > > -----Original Message----- > > From: Steve Davis [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, October 19, 2005 7:12 AM > > To: [email protected] > > Subject: RE: character sets > > > > > > I am not an expert in this area, but the following code may help: > > > > /** > > * Gets the formatted character-encoded string > representation of an > > XmlTokenSource. > > * @param xmlTokenSource - typically an XmlBean object > > * @param encoding - the desired character encoding > > * @return String ready for transmission > > * @throws Exception > > */ > > public static String getEncodedXmlText(XmlTokenSource > > xmlTokenSource, String encoding) > > throws Exception > > { > > // Setup various properties of the XML instance document > > xmlTokenSource.documentProperties().setEncoding(encoding); > > xmlTokenSource.documentProperties().setVersion("1.0"); > > XmlOptions xmlOptions = new XmlOptions(); > > xmlOptions.setCharacterEncoding(encoding); > > xmlOptions.setUseDefaultNamespace(); > > xmlOptions.setSaveAggressiveNamespaces(); > > > > // Format to a buffer and read it back into a string > > ByteArrayOutputStream bos = new ByteArrayOutputStream(); > > xmlTokenSource.save(bos, xmlOptions); > > return bos.toString(); > > } > > > > > > -----Original Message----- > > From: Wendell, Christian [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, October 19, 2005 4:31 AM > > To: [email protected] > > Cc: Kekkonen, Jari > > Subject: character sets > > > > > > Hi > > > > We have a Struts/tomcat solution, the jsp files using the 8859-1 > > character set, on top of an Oracle db, which uses UTF-8/Unicode. Of > > course, in Scandinavia, we have Scandinavian characters. > The problem > > is putting stuff from the html input boxes into the db, and reading > > stuff from the db to the page, so that the umlaut chars work. > > > > The bean models we use are generated from schemas with > XMLBean tools. > > In the action that reads stuff from the page and sends it > to the db, > > we try to put the encoding into the XML header, but fail: > > > > XmlOptions opts= new XmlOptions(); > > /* we'd also like to encode '>', using the newest devel version, > > which also has no effect: > > XmlOptionCharEscapeMap escapes= new XmlOptionCharEscapeMap(); > > escapes.addMapping('>', XmlOptionCharEscapeMap.PREDEF_ENTITY); > > opts.setSaveSubstituteCharacters(escapes); > > */ > > opts.setCharacterEncoding("ISO-8859-1"); > > Note addedNote= Note.Factory.newInstance(opts); //The bean > > addedNote.setNote(text); //text contains umlaut characters from > > the jsp > > > > Logging the xml structure, we can't see any difference in the > > generated xml whether we do setCharacterEncoding() or not. > > > > Is our strategy right but implementation wrong, or should > we do this > > somehow else? > > > > Christian > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

