Hi,
I'm experiencing a problem with an extension function that I implemented to serialize the content of a set of nodes. I'm using the FormatterToXML class in the Xalan API to produce XML text from a node set. The problem I'm having is that the formatter looses all the namespace information in the original document. The original prefixes are kept and that only makes things worse because after the serialization takes place the result is not well-formed. I was hoping it was only a matter of setting up some parameter but I can't seem to make it work. The code of the serialization function is:
XObjectPtr XMLNodeSerializer::execute(XPathExecutionContext& executionContext,
XalanNode* context,
const XObjectArgVectorType& args,
const Locator* locator) const {
// Return an empty string if the arguments don't match.
if ((args.size() != 1) || args[0].null()) {
return executionContext.getXObjectFactory()
.createString(XalanDOMString(""));
}
const NodeRefListBase *nodeList = &args[0]->nodeset();
const NodeRefList::size_type nodeListLength = nodeList->getLength();
if (nodeListLength == 0) {
return executionContext.getXObjectFactory()
.createString(XalanDOMString(""));
}
XMLByteStream& outputStream = _stage.getStrStream();
XMLNodeSerializer* thisObject = const_cast<XMLNodeSerializer*>(this);
_stage.getStrStream().reset();
XalanTransformerOutputStream xalanOutputStream(thisObject, &write);
XalanOutputStreamPrintWriter streamWriter(xalanOutputStream);
FormatterToXML formatter(streamWriter, // Use our stream writer
XalanDOMString(), // Use the default version
false, // Indent the output
0, // Indentation size if 4 characters
XalanDOMString(NODE_ENCODING)); // Enconde the result in UTF-16
if (_stage.isChunkFormattingEnabled()) {
formatter.setDoIndent(true);
formatter.setIndent(INDENTATION_SIZE);
}
// Don't write a header...
formatter.setShouldWriteXMLHeader(false);
formatter.setEscapeCData(false);
formatter.setStripCData(false);
// It's required that we do this...
formatter.startDocument();
FormatterTreeWalker treeWalker(formatter);
// Traverse the subtree of the document rooted at
// each node we've selected...
for (NodeRefList::size_type i = 0; i < nodeListLength; ++i) {
const XalanNode* const theNode = nodeList->item(i);
assert(theNode != 0);
const XalanNode::NodeType theNodeType =
theNode->getNodeType();
if (theNodeType != XalanNode::ATTRIBUTE_NODE) {
treeWalker.traverseSubtree(theNode);
}
}
// It's required that we do this...
formatter.endDocument();
// Make sure that the stream is NULL terminated;
XalanDOMChar nullChar = 0;
outputStream.write(&nullChar, sizeof(XalanDOMChar));
size_t serializedNodeLength = 0;
const char* serializedNode =
(const char*)outputStream.getContent(serializedNodeLength);
// When building the result string skip the byte order mark bytes.
return executionContext.getXObjectFactory()
.createString(XalanDOMString((const XalanDOMChar*)(serializedNode+2)));
}
If I serialize the root element (asset:assetList) of the following document:
<?xml version="1.0" encoding="UTF-8"?>
<asset:assetList xmlns:asset="http://www.cust.org/Asset" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cust.org/Asset
I:\development\documentation\training\xmlpack\BestPractices\examples\asset.xsd" xmlns:bla="http://www.bla.com/BlaBlaBla">
<asset:asset name="ASCL" xmlns:bla="http://www.bla.com/YadiYadiYa">
<asset:assetDescription type="short">Ascential</asset:assetDescription>
<asset:assetDescription type="long">Ascential Software Corporation</asset:assetDescription>
<bla:bla>BlaBlaBla</bla:bla>
</asset:asset>
<asset:asset name="MSFT">
<asset:assetDescription type="short">Microsoft</asset:assetDescription>
<asset:assetDescription type="long">Microsoft Corporation</asset:assetDescription>
</asset:asset>
<asset:asset name="ORCL">
<asset:assetDescription type="short">Oracle</asset:assetDescription>
<asset:assetDescription type="long">Oracle Corporation</asset:assetDescription>
</asset:asset>
</asset:assetList>
I get:
<asset:assetList xsi:schemaLocation="http://www.cust.org/Asset I:\development\documentation\training\xmlpack\BestPractices\examples\asset.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace">
<asset:asset name="ASCL">
<asset:assetDescription type="short">Ascential</asset:assetDescription>
<asset:assetDescription type="long">Ascential Software Corporation</asset:assetDescription>
<bla:bla>BlaBlaBla</bla:bla>
</asset:asset>
<asset:asset name="MSFT">
<asset:assetDescription type="short">Microsoft</asset:assetDescription>
<asset:assetDescription type="long">Microsoft Corporation</asset:assetDescription>
</asset:asset>
<asset:asset name="ORCL">
<asset:assetDescription type="short">Oracle</asset:assetDescription>
<asset:assetDescription type="long">Oracle Corporation</asset:assetDescription>
</asset:asset>
</asset:assetList>
I assume the result is correct since I'm creating a new document and then copying the node set to it, but my questions is. Do you have an idea of how can I keep the namespace information? I was thinking of going to the extreme of subclassing the FormatterTreeWalker class and keep track of prefixes/namespaces as it walks the node list and force the namespace information where it is first used. Would you suggest something different?
Many thanks,
Hernando Borda
Software Developer
Ascential Software Corp.
