Re: Encoding scheme with DOMWriter

Neil Graham Thu, 20 Mar 2003 14:50:47 -0800

Hi Elliot,

The last two lines of your code seem to be the most interesting:


            XMLCh* tempxmltext=theSerializer->writeToString(*m_Node);

The documentation of the writeToString method [1] quite clearly states that
the output will be in UTF-16 and that the document's encoding will be
ignored; so that's why you're seeing UTF-16 output.

      char* xmlchartext = XMLString::transcode(tempxmltext);

And this is transcoding your text for you, but the previous step already
overwrote the encoding information; that is, encoding="UTF0-16" is already
there and this is just transcoding that string, like any other.

If you want a sequence of bytes in memory, why not use the DOMWriter's
writeNode method to write to a MemBufFormatTarget, then use  getRawBuffer
to get yourself an array of XMLBytes (which is typedef'd to unsigned char).
Hopefully that'll meet whatever need is compelling you to want a UTF8 char
array in memory.

Hope that helps,
Neil

[1]:  http://xml.apache.org/xerces-c/apiDocs/classDOMWriter.html#z272_12

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           [EMAIL PROTECTED]|
|         |           m                |
|         |                            |
|         |           03/20/2003 05:10 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->
  
>---------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                                         |
  |       To:       [EMAIL PROTECTED]                                                  
                                               |
  |       cc:                                                                          
                                                         |
  |       Subject:  Encoding scheme with DOMWriter                                     
                                                         |
  |                                                                                    
                                                         |
  |                                                                                    
                                                         |
  
>---------------------------------------------------------------------------------------------------------------------------------------------|




I am having trouble with the writeToString function of the DOMWriter.
Basically, when I load an xml file using XercesDOMParser parse(path), and
then print the information using writeToString, the decl shows up with the
wrong encoding.  The odd thing is that the DOMDocument getEncoding()
function indicates the correct encoding shown in the document.

For example, in the DOMPrint example, the following:

<?xml version="1.0" encoding="UTF-8"?>
<personnel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
         xsi:noNamespaceSchemaLocation='personal.xsd'>

shows up as:

<?xml version="1.0" encoding="UTF-16" standalone="no" ?><personnel
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:noNamespaceSchemaLocation="personal.xsd">

Am I doing something wrong.  Based on the DOMPrint example, it seems as
though I shouldn't have to explicitly set the encoding if it is called out
in the document itself.

The code is a little convoluted as I am building a wrapper dll around
xerces to use it in another program, but here is the part that writes the
xml to string:

      XMLCh tempStr[100];
      XMLString::transcode("LS", tempStr, 99);
      DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(tempStr);
      DOMWriter *theSerializer = ((DOMImplementationLS*)impl)
->createDOMWriter();
      char *outputencoding =
XMLString::transcode(theSerializer->getEncoding());
      XMLCh* tempxmltext=theSerializer->writeToString(*m_Node);
      char* xmlchartext = XMLString::transcode(tempxmltext);

Thanks!



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Encoding scheme with DOMWriter

Reply via email to