Neil,

Thanks for the help.  That was definitely it.  Now my question is whether
this is what I should be doing.  Basically, since I am writing a wrapper
around this functionality, I need to be able to serialize a node as text
and deliver it as a modified PStr (4 byte length instead of one) in native
code page (b/c it will eventually make its way to a different primary
application).  So, whatever platform I happen to be on, I should be able to
give the user the "xml" associated with a given node.

So, getRawBuffer() returns the internal raw buffer, but I assume that this
is not necessarily in the native code page.

Any thoughts?

Thanks.



|---------+---------------------------->
|         |           "Neil Graham"    |
|         |           <[EMAIL PROTECTED]|
|         |           >                |
|         |                            |
|         |           03/20/2003 04:52 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                    
                                                              |
  |       To:       [EMAIL PROTECTED]                                                  
                                                    |
  |       cc:                                                                          
                                                              |
  |       Subject:  Re: Encoding scheme with DOMWriter                                 
                                                              |
  
>--------------------------------------------------------------------------------------------------------------------------------------------------|




Hi Elliot,

The last two lines of your code seem to be the most interesting:

            XMLCh* tempxmltext=theSerializer->writeToString(*m_Node);

The documentation of the writeToString method [1] quite clearly states that
the output will be in UTF-16 and that the document's encoding will be
ignored; so that's why you're seeing UTF-16 output.

      char* xmlchartext = XMLString::transcode(tempxmltext);

And this is transcoding your text for you, but the previous step already
overwrote the encoding information; that is, encoding="UTF0-16" is already
there and this is just transcoding that string, like any other.

If you want a sequence of bytes in memory, why not use the DOMWriter's
writeNode method to write to a MemBufFormatTarget, then use  getRawBuffer
to get yourself an array of XMLBytes (which is typedef'd to unsigned char).
Hopefully that'll meet whatever need is compelling you to want a UTF8 char
array in memory.

Hope that helps,
Neil

[1]:  http://xml.apache.org/xerces-c/apiDocs/classDOMWriter.html#z272_12

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]




|---------+---------------------------->
|         |           [EMAIL PROTECTED]|
|         |           m                |
|         |                            |
|         |           03/20/2003 05:10 |
|         |           PM               |
|         |           Please respond to|
|         |           xerces-c-dev     |
|         |                            |
|---------+---------------------------->
  >
---------------------------------------------------------------------------------------------------------------------------------------------|

  |
|
  |       To:       [EMAIL PROTECTED]
|
  |       cc:
|
  |       Subject:  Encoding scheme with DOMWriter
|
  |
|
  |
|
  >
---------------------------------------------------------------------------------------------------------------------------------------------|





I am having trouble with the writeToString function of the DOMWriter.
Basically, when I load an xml file using XercesDOMParser parse(path), and
then print the information using writeToString, the decl shows up with the
wrong encoding.  The odd thing is that the DOMDocument getEncoding()
function indicates the correct encoding shown in the document.

For example, in the DOMPrint example, the following:

<?xml version="1.0" encoding="UTF-8"?>
<personnel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
         xsi:noNamespaceSchemaLocation='personal.xsd'>

shows up as:

<?xml version="1.0" encoding="UTF-16" standalone="no" ?><personnel
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:noNamespaceSchemaLocation="personal.xsd">

Am I doing something wrong.  Based on the DOMPrint example, it seems as
though I shouldn't have to explicitly set the encoding if it is called out
in the document itself.

The code is a little convoluted as I am building a wrapper dll around
xerces to use it in another program, but here is the part that writes the
xml to string:

      XMLCh tempStr[100];
      XMLString::transcode("LS", tempStr, 99);
      DOMImplementation *impl =
DOMImplementationRegistry::getDOMImplementation(tempStr);
      DOMWriter *theSerializer = ((DOMImplementationLS*)impl)
->createDOMWriter();
      char *outputencoding =
XMLString::transcode(theSerializer->getEncoding());
      XMLCh* tempxmltext=theSerializer->writeToString(*m_Node);
      char* xmlchartext = XMLString::transcode(tempxmltext);

Thanks!



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to