DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23226>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23226 DOMWriter doesn't escape newlines when serializing attribute values Summary: DOMWriter doesn't escape newlines when serializing attribute values Product: Xerces-C++ Version: 2.3.0 Platform: PC OS/Version: Windows NT/2K Status: NEW Severity: Major Priority: Other Component: DOM AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] In keeping with section 3.3.3 of the XML 1.0 Specification, 2nd Edition http://www.w3.org/TR/REC-xml#AVNormalize), the parser normalizes newline character references to spaces. Because the DOMWriter doesn't generate character references for actual newlines in attribute values, this leads to anomalous behavior. This is what happens: 1. Parse an input document with an attribute value containing references (
) to newline characters. These are translated into actual newline characters and presented to the app. 2. Serialize the parsed document using DOMWriter. The output document will contain actual newline characters in place of the original character references. 3. Parse the document created in step 2. The newline characters in the input document will be normalized to space characters. 4. Serialize the document parsed in step 3. The output document will contain space characters in place of the newline character references in the original input document. To avoid this behavior, DOMWriter should generate character references for any newlines it encounters in attribute values. Here is the text of a test program demonstrating this behavior: --- /* ** Test app showing problem with DOMWriter and normalized attribute content. ** In keeping with section 3.3.3 of the XML 1.0 Specification, 2nd Edition ** (http://www.w3.org/TR/REC-xml#AVNormalize), the parser normalizes newline ** character references to spaces. Because the DOMWriter doesn't generate ** character references for actual newlines in attribute values, this leads ** to anomalous behavior, as demonstrated by this program. ** ** This is what happens: ** ** 1. Parse an input document with an attribute value containing references ** (
) to newline characters. These are translated into actual ** newline characters and presented to the app. ** 2. Serialize the parsed document using DOMWriter. The output document ** will contain actual newline characters in place of the original ** character references. ** 3. Parse the document created in step 2. The newline characters in the ** input document will be normalized to space characters. ** 4. Serialize the document parsed in step 3. The output document will ** contain space characters in place of the newline character references ** in the original input document. ** ** To avoid this behavior, DOMWriter should generate character references for ** any newlines it encounters in attribute values. */ #include <iostream> using namespace std; #include <xercesc/framework/LocalFileFormatTarget.hpp> #include <xercesc/parsers/XercesDOMParser.hpp> #include <xercesc/dom/DOM.hpp> #include <xercesc/dom/DOMImplementation.hpp> #include <xercesc/dom/DOMImplementationLS.hpp> #include <xercesc/dom/DOMWriter.hpp> #include <xercesc/dom/DOMException.hpp> #include <xercesc/util/PlatformUtils.hpp> #include <xercesc/util/XMLString.hpp> #include <xercesc/sax/SAXException.hpp> XERCES_CPP_NAMESPACE_USE int main(int argc, char *argv[]) { int nRetVal = 0; XMLPlatformUtils::Initialize(); try { // Create a DOMWriter instance. DOMImplementation* pImpl = DOMImplementationRegistry::getDOMImplementation(L"LS"); DOMWriter* pDOMWriter = (dynamic_cast<DOMImplementationLS*>(pImpl))- >createDOMWriter(); if (pDOMWriter->canSetFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true)) { pDOMWriter->setFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true); } XercesDOMParser parser; DOMDocument* pDocument; // Parse initial input document with newline character references. parser.parse(L"TestInput1.xml"); if (pDocument = parser.getDocument()) { // Serialize the parsed document. The output document will contain actual newline characters. LocalFileFormatTarget lfft(L"TestOutput1.xml"); pDOMWriter->writeNode(&lfft, *pDocument); } // Now parse the serialized document. parser.parse(L"TestOutput1.xml"); if (pDocument = parser.getDocument()) { // Re-serialize the document. In this version, the newlines will be replaced with spaces. LocalFileFormatTarget lfft(L"TestOutput2.xml"); pDOMWriter->writeNode(&lfft, *pDocument); } pDOMWriter->release(); } catch (...) { cerr << "Unexpected exception." << endl; nRetVal = 1; } XMLPlatformUtils::Terminate(); return nRetVal; } --- Here is TestInput1.xml: --- <?xml version='1.0' encoding='UTF-8' standalone='yes'?> <policy attr="Line1
Line2
Line3
"/> --- Here is what you get for TestOutput1.xml: --- <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <policy attr="Line1 Line2 Line3 "/> --- Here is what you get for TestOutput2.xml: --- <?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <policy attr="Line1 Line2 Line3 "/> --- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
