On Fri, Oct 27, 2006 at 05:50:15PM +0200, Keim, Markus wrote: > Hi Daniel, > > OK, I've updated the function comment of xmlNodeSetContent() > with a note similar to xmlNewDocNode(), which works on the > same level AFAICS, something along > > * NOTE: @content is supposed to be a piece of XML CDATA, so it allow entities > * references, but XML special chars need to be escaped first by using > * xmlEncodeEntitiesReentrant() resp. xmlEncodeSpecialChars(). > > Additoinaly, xmlNodeAddContent() has a note on the different behaviour > WRT xmlNodeSetContent() and an explicit hint *not* to call > xmlEncodeEntitiesReentrant(). > > I would've sent the patch, but while going through the source > of these calls, I noticed another, maybe more serious issue. > The note above (which appears in the docs for several API calls, > e.g. xmlNewDocNode() and xmlNewChild()) is not correct IMHO, > these calls provide no entity support, at least not for arbitrary > input. > As correctly documented, you've to call xmlEncodeEntitiesReentrant(), > resp. xmlEncodeSpecialChars(), since special XML characters has > to be replaced on that level. But a call to these function will > also replace the ampersand of a possible entity reference in the > content buffer. > I've tested it, e.g. calling > xmlNodeSetContent(node, BAD_CAST "&myEnt;") > with a declared entity "myEnt" will create an XML_ENTITY_REF_NODE > child node with name "myEnt" and the declared content, I'd say that's > what's meant with entity support. > But for arbitrary content, we've to call xmlEncodeEntitiesReentrant() > first, and calling > xmlNodeSetContent(node, > xmlEncodeEntitiesReentrant(BAD_CAST "&myEnt;")) > will result in an XML_TEXT_NODE child node with content "&myEnt;", > this will be serialized to "&myEnt;" if we dump the node to disk! > > I first thought that I must be wrong, cause this would be quite > a concern, but I've tested it and get the above mentioned behaviour. > > Any thoughts?
Either your content is already escaped, which should be the case if you have existing entities references in your strings, and it seems obviour you should not call for a second escaping, or it is not and in that case as the documentation explain you should call it. If you put yourself in a situation where you have string containing both complete entities references and single & then no libxml2 API will both preserve the existing references and escape the & singleton. It's a matter of layering, either your string is a markup fragment or it is not. In the first case it should already be escaped, and in the second you must escape. I see no problem here, really, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
