DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5516>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5516 Corrupted Strings when mapping Xerces DOM to Xalan DOM Summary: Corrupted Strings when mapping Xerces DOM to Xalan DOM Product: XalanC Version: 1.2.x Platform: Alpha OS/Version: Other Status: NEW Severity: Major Priority: Other Component: XalanC AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] When using the Xerces C++ parser (1.5.2) to parse a document, then the Xalan XSLT engine to transform the DOM, we have to map the Xerces DOM to Xalan DOM as follow: const DOM_Document theDOM = parser.getDocument(); XercesDOMSupport theDOMSupport; XercesParserLiaison theParserLiaison(theDOMSupport); XalanDocument* xalanDoc = theParserLiaison.createDocument(theDOM); Doing so, the XercesParserLiaison stuff appears to corrupt strings (end-of- strings, actually, since additional rubbish chars are randomly appended to the original strings.) This is especially visible with large files & long text elements. After some investigations, we managed to correct the problem by changing the following method calls in XercesParserLiaison component: from: getPooledString(xyz.rawBuffer()); to: getPooledString(xyz.rawBuffer(), xyz.length()); The following files of XercesParserLiaison are impacted by the correction: XercesDocumentBridge.hpp XercesDocumentBridge.cpp XercesBridgeNavigator.hpp XercesBridgeNavigator.cpp XercesAttrBridge.cpp XercesCDATASectionBridge.cpp XercesCommentBridge.cpp XercesDocumentFragmentBridge.cpp XercesDocumentTypeBridge.cpp XercesElementBridge.cpp XercesEntityBridge.cpp XercesEntityReferenceBridge.cpp XercesNotationBridge.cpp XercesProcessingInstructionBridge.cpp XercesTextBridge.cpp The following sed script and sed modifier might be helpful to automate the correction (only XercesDocumentBridge and XercesBridgeNavigator still require manual editing after that). sed.script: #!/bin/ksh files=`grep -l getPooled *.cpp` for f in $files; do mv $f $f.orig sed -f modifier.sed $f.orig > $f done modifier.sed: s/getPooledString(m_xercesNode.getNodeName().rawBuffer());/getPooledString (m_xercesNode.getNodeName().rawBuffer(), m_xercesNode.getNodeName().length());/g s/getPooledString(m_xercesNode.getNodeValue().rawBuffer());/getPooledString (m_xercesNode.getNodeValue().rawBuffer(), m_xercesNode.getNodeValue().length ());/g s/getPooledString(m_xercesNode.getNamespaceURI().rawBuffer());/getPooledString (m_xercesNode.getNamespaceURI().rawBuffer(), m_xercesNode.getNamespaceURI ().length());/g s/getPooledString(m_xercesNode.getPrefix().rawBuffer());/getPooledString (m_xercesNode.getPrefix().rawBuffer(), m_xercesNode.getPrefix().length());/g s/getPooledString(m_xercesNode.getLocalName().rawBuffer());/getPooledString (m_xercesNode.getLocalName().rawBuffer(), m_xercesNode.getLocalName().length ());/g s/getPooledString(m_xercesNode.getName().rawBuffer());/getPooledString (m_xercesNode.getName().rawBuffer(), m_xercesNode.getName().length());/g s/getPooledString(m_xercesNode.getValue().rawBuffer());/getPooledString (m_xercesNode.getValue().rawBuffer(), m_xercesNode.getValue().length());/g s/getPooledString(m_xercesNode.getData().rawBuffer());/getPooledString (m_xercesNode.getData().rawBuffer(), m_xercesNode.getData().length());/g s/getPooledString(m_xercesNode.getPublicId().rawBuffer());/getPooledString (m_xercesNode.getPublicId().rawBuffer(), m_xercesNode.getPublicId().length());/g s/getPooledString(m_xercesNode.getSystemId().rawBuffer());/getPooledString (m_xercesNode.getSystemId().rawBuffer(), m_xercesNode.getSystemId().length());/g s/getPooledString(m_xercesNode.getInternalSubset().rawBuffer());/getPooledString (m_xercesNode.getInternalSubset().rawBuffer(), m_xercesNode.getInternalSubset ().length());/g s/getPooledString(m_xercesNode.getNotationName().rawBuffer());/getPooledString (m_xercesNode.getNotationName().rawBuffer(), m_xercesNode.getNotationName ().length());/g s/getPooledString(m_xercesNode.getTarget().rawBuffer());/getPooledString (m_xercesNode.getTarget().rawBuffer(), m_xercesNode.getTarget().length());/g s/getPooledString(m_xercesNode.getNodeNameImpl().rawBuffer());/getPooledString (m_xercesNode.getNodeNameImpl().rawBuffer(), m_xercesNode.getNodeNameImpl ().length());/g s/getPooledString(m_xercesNode.getNodeValueImpl().rawBuffer());/getPooledString (m_xercesNode.getNodeValueImpl().rawBuffer(), m_xercesNode.getNodeValueImpl ().length());/g s/getPooledString(m_xercesNode.getNamespaceURIImpl().rawBuffer ());/getPooledString(m_xercesNode.getNamespaceURIImpl().rawBuffer(), m_xercesNode.getNamespaceURIImpl().length());/g s/getPooledString(m_xercesNode.getPrefixImpl().rawBuffer());/getPooledString (m_xercesNode.getPrefixImpl().rawBuffer(), m_xercesNode.getPrefixImpl().length ());/g s/getPooledString(m_xercesNode.getLocalNameImpl().rawBuffer());/getPooledString (m_xercesNode.getLocalNameImpl().rawBuffer(), m_xercesNode.getLocalNameImpl ().length());/g s/getPooledString(m_xercesNode.getDataImpl().rawBuffer());/getPooledString (m_xercesNode.getDataImpl().rawBuffer(), m_xercesNode.getDataImpl().length());/g s/getPooledString(m_xercesNode.getTagNameImpl().rawBuffer());/getPooledString (m_xercesNode.getTagNameImpl().rawBuffer(), m_xercesNode.getTagNameImpl().length ());/g s/getPooledString(m_xercesNode.getAttributeImpl(c_wstr(name)).rawBuffer ());/getPooledString(m_xercesNode.getAttributeImpl(c_wstr(name)).rawBuffer(), m_xercesNode.getAttributeImpl(c_wstr(na me)).length());/g s/getPooledString(m_xercesNode.getAttributeNSImpl(c_wstr(namespaceURI), c_wstr (localName)).rawBuffer());/getPooledString(m_xercesNode.getAttributeNSImpl (c_wstr(namespaceURI), c_wstr(localN ame)).rawBuffer(), m_xercesNode.getAttributeNSImpl(c_wstr(namespaceURI), c_wstr (localName)).length());/g s/getPooledString(m_xercesDocument.getNodeName().rawBuffer());/getPooledString (m_xercesDocument.getNodeName().rawBuffer(), m_xercesDocument.getNodeName ().length());/g s/getPooledString(m_xercesDocument.getNodeValue().rawBuffer());/getPooledString (m_xercesDocument.getNodeValue().rawBuffer(), m_xercesDocument.getNodeValue ().length());/g s/getPooledString(m_xercesDocument.getNamespaceURI().rawBuffer ());/getPooledString(m_xercesDocument.getNamespaceURI().rawBuffer(), m_xercesDocument.getNamespaceURI().length());/g s/getPooledString(m_xercesDocument.getPrefix().rawBuffer());/getPooledString (m_xercesDocument.getPrefix().rawBuffer(), m_xercesDocument.getPrefix().length ());/g s/getPooledString(m_xercesDocument.getLocalName().rawBuffer());/getPooledString (m_xercesDocument.getLocalName().rawBuffer(), m_xercesDocument.getLocalName ().length());/g Additional Question: Why are the Xerces DOM and Xalan DOM (and even worse, DOMString and XalanDOMString) so poorly integrated anyway? I would have thought that Xalan could reuse the Xerces DOM stuff, no?
