[xml] more an the "ampersand problem"

oliverst Wed, 25 May 2005 03:44:23 -0700

OK, I tried something different and here the code of what I tried:

        {
                const char* str = "<>\"'&";


                xmlNodePtr node = xmlNewNode(NULL, BAD_CAST "test");
                if( node ) {            
                        xmlNodeSetContent(node, BAD_CAST str);
                }

                string xml_str;
                CbXmlNodeToXmlFormatString(node, xml_str);
                cout << xml_str << endl << endl;        

                xmlNodePtr node2 = xmlNewNode(NULL, BAD_CAST str);

                CbXmlNodeToXmlFormatString(node2, xml_str);
                cout << xml_str << endl << endl;

                xmlNodePtr node3 = xmlNewNode(NULL, BAD_CAST "test");
                if( node3 ) {
                        xmlNewProp(node3, BAD_CAST str, BAD_CAST "test");
                }

                CbXmlNodeToXmlFormatString(node3, xml_str);
                cout << xml_str << endl << endl;

                xmlNodePtr node4 = xmlNewNode(NULL, BAD_CAST "test");
                if( node4 ) {
                        xmlNewProp(node4, BAD_CAST "test", BAD_CAST str);
                }

                CbXmlNodeToXmlFormatString(node4, xml_str);
                cout << xml_str << endl << endl;
        }

        {
                const char* str = "&lt;&gt;&quot;'&amp;";

                xmlNodePtr node = xmlNewNode(NULL, BAD_CAST "test");
                if( node ) {            
                        xmlNodeSetContent(node, BAD_CAST str);
                }

                string xml_str;
                CbXmlNodeToXmlFormatString(node, xml_str);
                cout << xml_str << endl << endl;        

                xmlNodePtr node2 = xmlNewNode(NULL, BAD_CAST str);

                CbXmlNodeToXmlFormatString(node2, xml_str);
                cout << xml_str << endl << endl;

                xmlNodePtr node3 = xmlNewNode(NULL, BAD_CAST "test");
                if( node3 ) {
                        xmlNewProp(node3, BAD_CAST str, BAD_CAST "test");
                }

                CbXmlNodeToXmlFormatString(node3, xml_str);
                cout << xml_str << endl << endl;

                xmlNodePtr node4 = xmlNewNode(NULL, BAD_CAST "test");
                if( node4 ) {
                        xmlNewProp(node4, BAD_CAST "test", BAD_CAST str);
                }

                CbXmlNodeToXmlFormatString(node4, xml_str);
                cout << xml_str << endl << endl;

        }

and the results:

error : unterminated entity reference
<test>&lt;&gt;"'</test>

<<>"'&/>

<test <>"'&="test"/>

<test test="&lt;&gt;&quot;'&amp;"/>

<test>&lt;&gt;"'&amp;</test>

<&lt;&gt;&quot;'&amp;/>

<test &lt;&gt;&quot;'&amp;="test"/>

<test test="&amp;lt;&amp;gt;&amp;quot;'&amp;amp;"/>


In case 1 it does not encode the & and does drop it with an error message, but 
it does encode all the others. Igor's explanation was totally acceptable to me, 
but as seen in case 4 everything is converted there.
The cases 2 and 3 can be ignored, as it's not a valid XML in both cases and I 
have to assure, that I use valid chars for the attribute or node name.
But in the last case I give the attribute an already properly encoded string 
and it does double-encode it. This also feels wrong to me, because you have to 
encode it, but it's nowhere mentioned, if libxml2 is doing the encoding for you 
or not. At least something like this should be added to the documentation.
If the first case would behave properly on the ampersand I would only have to 
care, that my input is proper UTF-8 and not about those "bad" special chars and 
would not have to convert anything by myself. Now I have to do the same as the 
libxml with my input string, parse it for the "unterminated entity" (the 
single-standing smapersand) and convert it.
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] more an the "ampersand problem"

Reply via email to