DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5516>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5516

Corrupted Strings when mapping Xerces DOM to Xalan DOM

           Summary: Corrupted Strings when mapping Xerces DOM to Xalan DOM
           Product: XalanC
           Version: 1.2.x
          Platform: Alpha
        OS/Version: Other
            Status: NEW
          Severity: Major
          Priority: Other
         Component: XalanC
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


When using the Xerces C++ parser (1.5.2) to parse a document, then the Xalan 
XSLT engine to transform the DOM, we have to map the Xerces DOM to Xalan DOM as 
follow:

  const DOM_Document theDOM = parser.getDocument();
  XercesDOMSupport theDOMSupport;
  XercesParserLiaison theParserLiaison(theDOMSupport);
  XalanDocument* xalanDoc = theParserLiaison.createDocument(theDOM);

Doing so, the XercesParserLiaison stuff appears to corrupt strings (end-of-
strings, actually, since additional rubbish chars are randomly appended to the 
original strings.) This is especially visible with large files & long text 
elements.

After some investigations, we managed to correct the problem by changing the 
following method calls in XercesParserLiaison component:

from: getPooledString(xyz.rawBuffer());
to:   getPooledString(xyz.rawBuffer(), xyz.length());

The following files of XercesParserLiaison are impacted by the correction:

XercesDocumentBridge.hpp 
XercesDocumentBridge.cpp 

XercesBridgeNavigator.hpp 
XercesBridgeNavigator.cpp 

XercesAttrBridge.cpp 
XercesCDATASectionBridge.cpp 
XercesCommentBridge.cpp 
XercesDocumentFragmentBridge.cpp 
XercesDocumentTypeBridge.cpp 
XercesElementBridge.cpp 
XercesEntityBridge.cpp 
XercesEntityReferenceBridge.cpp 
XercesNotationBridge.cpp 
XercesProcessingInstructionBridge.cpp 
XercesTextBridge.cpp 

The following sed script and sed modifier might be helpful to automate the 
correction (only XercesDocumentBridge and XercesBridgeNavigator still require 
manual editing after that).

sed.script:

#!/bin/ksh
files=`grep -l getPooled *.cpp`
for f in $files; do
  mv $f $f.orig
  sed -f modifier.sed $f.orig > $f
done

modifier.sed:

s/getPooledString(m_xercesNode.getNodeName().rawBuffer());/getPooledString
(m_xercesNode.getNodeName().rawBuffer(), m_xercesNode.getNodeName().length());/g
s/getPooledString(m_xercesNode.getNodeValue().rawBuffer());/getPooledString
(m_xercesNode.getNodeValue().rawBuffer(), m_xercesNode.getNodeValue().length
());/g
s/getPooledString(m_xercesNode.getNamespaceURI().rawBuffer());/getPooledString
(m_xercesNode.getNamespaceURI().rawBuffer(), m_xercesNode.getNamespaceURI
().length());/g
s/getPooledString(m_xercesNode.getPrefix().rawBuffer());/getPooledString
(m_xercesNode.getPrefix().rawBuffer(), m_xercesNode.getPrefix().length());/g
s/getPooledString(m_xercesNode.getLocalName().rawBuffer());/getPooledString
(m_xercesNode.getLocalName().rawBuffer(), m_xercesNode.getLocalName().length
());/g
s/getPooledString(m_xercesNode.getName().rawBuffer());/getPooledString
(m_xercesNode.getName().rawBuffer(), m_xercesNode.getName().length());/g
s/getPooledString(m_xercesNode.getValue().rawBuffer());/getPooledString
(m_xercesNode.getValue().rawBuffer(), m_xercesNode.getValue().length());/g
s/getPooledString(m_xercesNode.getData().rawBuffer());/getPooledString
(m_xercesNode.getData().rawBuffer(), m_xercesNode.getData().length());/g
s/getPooledString(m_xercesNode.getPublicId().rawBuffer());/getPooledString
(m_xercesNode.getPublicId().rawBuffer(), m_xercesNode.getPublicId().length());/g
s/getPooledString(m_xercesNode.getSystemId().rawBuffer());/getPooledString
(m_xercesNode.getSystemId().rawBuffer(), m_xercesNode.getSystemId().length());/g
s/getPooledString(m_xercesNode.getInternalSubset().rawBuffer());/getPooledString
(m_xercesNode.getInternalSubset().rawBuffer(), m_xercesNode.getInternalSubset
().length());/g
s/getPooledString(m_xercesNode.getNotationName().rawBuffer());/getPooledString
(m_xercesNode.getNotationName().rawBuffer(), m_xercesNode.getNotationName
().length());/g
s/getPooledString(m_xercesNode.getTarget().rawBuffer());/getPooledString
(m_xercesNode.getTarget().rawBuffer(), m_xercesNode.getTarget().length());/g
s/getPooledString(m_xercesNode.getNodeNameImpl().rawBuffer());/getPooledString
(m_xercesNode.getNodeNameImpl().rawBuffer(), m_xercesNode.getNodeNameImpl
().length());/g
s/getPooledString(m_xercesNode.getNodeValueImpl().rawBuffer());/getPooledString
(m_xercesNode.getNodeValueImpl().rawBuffer(), m_xercesNode.getNodeValueImpl
().length());/g
s/getPooledString(m_xercesNode.getNamespaceURIImpl().rawBuffer
());/getPooledString(m_xercesNode.getNamespaceURIImpl().rawBuffer(), 
m_xercesNode.getNamespaceURIImpl().length());/g
s/getPooledString(m_xercesNode.getPrefixImpl().rawBuffer());/getPooledString
(m_xercesNode.getPrefixImpl().rawBuffer(), m_xercesNode.getPrefixImpl().length
());/g
s/getPooledString(m_xercesNode.getLocalNameImpl().rawBuffer());/getPooledString
(m_xercesNode.getLocalNameImpl().rawBuffer(), m_xercesNode.getLocalNameImpl
().length());/g
s/getPooledString(m_xercesNode.getDataImpl().rawBuffer());/getPooledString
(m_xercesNode.getDataImpl().rawBuffer(), m_xercesNode.getDataImpl().length());/g
s/getPooledString(m_xercesNode.getTagNameImpl().rawBuffer());/getPooledString
(m_xercesNode.getTagNameImpl().rawBuffer(), m_xercesNode.getTagNameImpl().length
());/g
s/getPooledString(m_xercesNode.getAttributeImpl(c_wstr(name)).rawBuffer
());/getPooledString(m_xercesNode.getAttributeImpl(c_wstr(name)).rawBuffer(), 
m_xercesNode.getAttributeImpl(c_wstr(na
me)).length());/g
s/getPooledString(m_xercesNode.getAttributeNSImpl(c_wstr(namespaceURI), c_wstr
(localName)).rawBuffer());/getPooledString(m_xercesNode.getAttributeNSImpl
(c_wstr(namespaceURI), c_wstr(localN
ame)).rawBuffer(), m_xercesNode.getAttributeNSImpl(c_wstr(namespaceURI), c_wstr
(localName)).length());/g
s/getPooledString(m_xercesDocument.getNodeName().rawBuffer());/getPooledString
(m_xercesDocument.getNodeName().rawBuffer(), m_xercesDocument.getNodeName
().length());/g
s/getPooledString(m_xercesDocument.getNodeValue().rawBuffer());/getPooledString
(m_xercesDocument.getNodeValue().rawBuffer(), m_xercesDocument.getNodeValue
().length());/g
s/getPooledString(m_xercesDocument.getNamespaceURI().rawBuffer
());/getPooledString(m_xercesDocument.getNamespaceURI().rawBuffer(), 
m_xercesDocument.getNamespaceURI().length());/g
s/getPooledString(m_xercesDocument.getPrefix().rawBuffer());/getPooledString
(m_xercesDocument.getPrefix().rawBuffer(), m_xercesDocument.getPrefix().length
());/g
s/getPooledString(m_xercesDocument.getLocalName().rawBuffer());/getPooledString
(m_xercesDocument.getLocalName().rawBuffer(), m_xercesDocument.getLocalName
().length());/g


Additional Question:

Why are the Xerces DOM and Xalan DOM (and even worse, DOMString and 
XalanDOMString) so poorly integrated anyway? I would have thought that Xalan 
could reuse the Xerces DOM stuff, no?

Reply via email to