Also, Christian, could you try using "ISO8859_1" rather than
"ISO-8859-1" for the encoding string in your code? XmlBeans is using the
Java names for the encodings, which seemed more consistent than using
the IANA names (so you can get the encoding from other Java code and set
it directly etc). Of course, the generated documents always use the IANA
names.

Let us know how that works,
Radu

PS Regarding the use of setSaveSubstituteCharacters(), as the name
indicates, this is a save-time XmlOption, so it will not have any effect
when passed as argument to newInstance(), but when passed to xmlText()
or other similar methods. The reason for this is that the infoset
doesn't make any difference between a character being represented as
entity or as literal value.

-----Original Message-----
From: Steve Davis [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 19, 2005 7:12 AM
To: [email protected]
Subject: RE: character sets


I am not an expert in this area, but the following code may help:

  /**
   * Gets the formatted character-encoded string representation of an
XmlTokenSource.
   * @param xmlTokenSource - typically an XmlBean object
   * @param encoding - the desired character encoding
   * @return String ready for transmission
   * @throws Exception
   */
  public static String getEncodedXmlText(XmlTokenSource xmlTokenSource,
String encoding)
      throws Exception
  {
    // Setup various properties of the XML instance document
    xmlTokenSource.documentProperties().setEncoding(encoding);
    xmlTokenSource.documentProperties().setVersion("1.0");
    XmlOptions xmlOptions = new XmlOptions();
    xmlOptions.setCharacterEncoding(encoding);
    xmlOptions.setUseDefaultNamespace();
    xmlOptions.setSaveAggressiveNamespaces();

    // Format to a buffer and read it back into a string
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    xmlTokenSource.save(bos, xmlOptions);
    return bos.toString();
  }


-----Original Message-----
From: Wendell, Christian [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, October 19, 2005 4:31 AM
To: [email protected]
Cc: Kekkonen, Jari
Subject: character sets


Hi 

We have a Struts/tomcat solution, the jsp files using the 8859-1
character set, on top of an Oracle db, which uses UTF-8/Unicode. Of
course, in Scandinavia, we have Scandinavian characters. The problem is
putting stuff from the html input boxes into the db, and reading stuff
from the db to the page, so that the umlaut chars work.

The bean models we use are generated from schemas with XMLBean tools. In
the action that reads stuff from the page and sends it to the db, we try
to put the encoding into the XML header, but fail:

  XmlOptions opts= new XmlOptions();
  /* we'd also like to encode '>', using the newest devel version, which
also has no effect:
    XmlOptionCharEscapeMap escapes= new XmlOptionCharEscapeMap();
    escapes.addMapping('>', XmlOptionCharEscapeMap.PREDEF_ENTITY);
    opts.setSaveSubstituteCharacters(escapes);
  */
  opts.setCharacterEncoding("ISO-8859-1");
  Note addedNote= Note.Factory.newInstance(opts);       //The bean
  addedNote.setNote(text);      //text contains umlaut characters from
the jsp

Logging the xml structure, we can't see any difference in the generated
xml whether we do setCharacterEncoding() or not.

Is our strategy right but implementation wrong, or should we do this
somehow else?

Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to