DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23226>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23226

DOMWriter doesn't escape newlines when serializing attribute values

           Summary: DOMWriter doesn't escape newlines when serializing
                    attribute values
           Product: Xerces-C++
           Version: 2.3.0
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Major
          Priority: Other
         Component: DOM
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


In keeping with section 3.3.3 of the XML 1.0 Specification, 2nd Edition 
http://www.w3.org/TR/REC-xml#AVNormalize), the parser normalizes newline 
character references to spaces.  Because the DOMWriter doesn't generate 
character references for actual newlines in attribute values, this leads to 
anomalous behavior.

This is what happens:
   1. Parse an input document with an attribute value containing references
      (&#xa;) to newline characters.  These are translated into actual
      newline characters and presented to the app.
   2. Serialize the parsed document using DOMWriter.  The output document
      will contain actual newline characters in place of the original
      character references.
   3. Parse the document created in step 2.  The newline characters in the
      input document will be normalized to space characters.
   4. Serialize the document parsed in step 3.  The output document will
      contain space characters in place of the newline character references
      in the original input document.
To avoid this behavior, DOMWriter should generate character references for
any newlines it encounters in attribute values.

Here is the text of a test program demonstrating this behavior:
---
/*
** Test app showing problem with DOMWriter and normalized attribute content.
** In keeping with section 3.3.3 of the XML 1.0 Specification, 2nd Edition
** (http://www.w3.org/TR/REC-xml#AVNormalize), the parser normalizes newline
** character references to spaces.  Because the DOMWriter doesn't generate
** character references for actual newlines in attribute values, this leads
** to anomalous behavior, as demonstrated by this program.
**
** This is what happens:
**
**    1. Parse an input document with an attribute value containing references
**       (&#xa;) to newline characters.  These are translated into actual
**       newline characters and presented to the app.
**    2. Serialize the parsed document using DOMWriter.  The output document
**       will contain actual newline characters in place of the original
**       character references.
**    3. Parse the document created in step 2.  The newline characters in the
**       input document will be normalized to space characters.
**    4. Serialize the document parsed in step 3.  The output document will
**       contain space characters in place of the newline character references
**       in the original input document.
**
** To avoid this behavior, DOMWriter should generate character references for
** any newlines it encounters in attribute values.
*/

#include <iostream>

using namespace std;

#include <xercesc/framework/LocalFileFormatTarget.hpp>

#include <xercesc/parsers/XercesDOMParser.hpp>

#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOMImplementation.hpp>
#include <xercesc/dom/DOMImplementationLS.hpp>
#include <xercesc/dom/DOMWriter.hpp>
#include <xercesc/dom/DOMException.hpp>

#include <xercesc/util/PlatformUtils.hpp>
#include <xercesc/util/XMLString.hpp>

#include <xercesc/sax/SAXException.hpp>

XERCES_CPP_NAMESPACE_USE

int main(int argc, char *argv[])
{
    int nRetVal = 0;
    XMLPlatformUtils::Initialize();

    try
    {
        // Create a DOMWriter instance.
        DOMImplementation* pImpl = 
DOMImplementationRegistry::getDOMImplementation(L"LS");
        DOMWriter* pDOMWriter    = (dynamic_cast<DOMImplementationLS*>(pImpl))-
>createDOMWriter();

        if (pDOMWriter->canSetFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true))
        {
            pDOMWriter->setFeature(XMLUni::fgDOMWRTFormatPrettyPrint, true);
        }

        XercesDOMParser parser;
        DOMDocument*    pDocument;

        // Parse initial input document with newline character references.
        parser.parse(L"TestInput1.xml");

        if (pDocument = parser.getDocument())
        {
            // Serialize the parsed document.  The output document will contain 
actual newline characters.
            LocalFileFormatTarget lfft(L"TestOutput1.xml");
            pDOMWriter->writeNode(&lfft, *pDocument);
        }

        // Now parse the serialized document.
        parser.parse(L"TestOutput1.xml");

        if (pDocument = parser.getDocument())
        {
            // Re-serialize the document.  In this version, the newlines will 
be replaced with spaces.
            LocalFileFormatTarget lfft(L"TestOutput2.xml");
            pDOMWriter->writeNode(&lfft, *pDocument);
        }

        pDOMWriter->release();
    }
    catch (...)
    {
        cerr << "Unexpected exception." << endl;
        nRetVal = 1;
    }

    XMLPlatformUtils::Terminate();
    return nRetVal;
}
---
Here is TestInput1.xml:
---
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<policy attr="Line1&#xa;Line2&#xa;Line3&#xa;"/>
---
Here is what you get for TestOutput1.xml:
---
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<policy attr="Line1
Line2
Line3
"/>
---
Here is what you get for TestOutput2.xml:
---
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<policy attr="Line1 Line2 Line3 "/>
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to