Hi Elliot, I'd defer to PeiYong on this one, since he's done most of the serialization work. But my thought is that the rawBuffer would be in whatever encoding you set on the Document that you serialized into the MemBufFormatTarget (or the encoding of the original document of course if you haven't touched the encoding explicitly). So if you can determine what the native encoding is, then you should be able to induce the rawBuffer to reflect that encoding.
I am curious about one thing though: If you're producing XML, then why do you care what encoding the byte stream is in? If you're producing XML for another application then it follows that application must be XML-aware; if so, it must understand UTF-8 and UTF-16--at least that's what the XML spec implies. If it understands UTF-16, then the original array of XMLChars that writeToString handed to you should be enough... Cheers! Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] |---------+----------------------------> | | [EMAIL PROTECTED]| | | m | | | | | | 03/21/2003 03:27 | | | PM | | | Please respond to| | | xerces-c-dev | | | | |---------+----------------------------> >---------------------------------------------------------------------------------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Re: Encoding scheme with DOMWriter | | | | | >---------------------------------------------------------------------------------------------------------------------------------------------| Neil, Thanks for the help. That was definitely it. Now my question is whether this is what I should be doing. Basically, since I am writing a wrapper around this functionality, I need to be able to serialize a node as text and deliver it as a modified PStr (4 byte length instead of one) in native code page (b/c it will eventually make its way to a different primary application). So, whatever platform I happen to be on, I should be able to give the user the "xml" associated with a given node. So, getRawBuffer() returns the internal raw buffer, but I assume that this is not necessarily in the native code page. Any thoughts? Thanks. |---------+----------------------------> | | "Neil Graham" | | | <[EMAIL PROTECTED]| | | > | | | | | | 03/20/2003 04:52 | | | PM | | | Please respond to| | | xerces-c-dev | | | | |---------+----------------------------> > --------------------------------------------------------------------------------------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Re: Encoding scheme with DOMWriter | > --------------------------------------------------------------------------------------------------------------------------------------------------| Hi Elliot, The last two lines of your code seem to be the most interesting: XMLCh* tempxmltext=theSerializer->writeToString(*m_Node); The documentation of the writeToString method [1] quite clearly states that the output will be in UTF-16 and that the document's encoding will be ignored; so that's why you're seeing UTF-16 output. char* xmlchartext = XMLString::transcode(tempxmltext); And this is transcoding your text for you, but the previous step already overwrote the encoding information; that is, encoding="UTF0-16" is already there and this is just transcoding that string, like any other. If you want a sequence of bytes in memory, why not use the DOMWriter's writeNode method to write to a MemBufFormatTarget, then use getRawBuffer to get yourself an array of XMLBytes (which is typedef'd to unsigned char). Hopefully that'll meet whatever need is compelling you to want a UTF8 char array in memory. Hope that helps, Neil [1]: http://xml.apache.org/xerces-c/apiDocs/classDOMWriter.html#z272_12 Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] |---------+----------------------------> | | [EMAIL PROTECTED]| | | m | | | | | | 03/20/2003 05:10 | | | PM | | | Please respond to| | | xerces-c-dev | | | | |---------+----------------------------> > ---------------------------------------------------------------------------------------------------------------------------------------------| | | | To: [EMAIL PROTECTED] | | cc: | | Subject: Encoding scheme with DOMWriter | | | | | > ---------------------------------------------------------------------------------------------------------------------------------------------| I am having trouble with the writeToString function of the DOMWriter. Basically, when I load an xml file using XercesDOMParser parse(path), and then print the information using writeToString, the decl shows up with the wrong encoding. The odd thing is that the DOMDocument getEncoding() function indicates the correct encoding shown in the document. For example, in the DOMPrint example, the following: <?xml version="1.0" encoding="UTF-8"?> <personnel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='personal.xsd'> shows up as: <?xml version="1.0" encoding="UTF-16" standalone="no" ?><personnel xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="personal.xsd"> Am I doing something wrong. Based on the DOMPrint example, it seems as though I shouldn't have to explicitly set the encoding if it is called out in the document itself. The code is a little convoluted as I am building a wrapper dll around xerces to use it in another program, but here is the part that writes the xml to string: XMLCh tempStr[100]; XMLString::transcode("LS", tempStr, 99); DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr); DOMWriter *theSerializer = ((DOMImplementationLS*)impl) ->createDOMWriter(); char *outputencoding = XMLString::transcode(theSerializer->getEncoding()); XMLCh* tempxmltext=theSerializer->writeToString(*m_Node); char* xmlchartext = XMLString::transcode(tempxmltext); Thanks! --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]