On Fri, Jan 25, 2008 at 02:39:22PM +0100, Julien Charbon wrote: > Daniel Veillard wrote: > > On Fri, Jan 25, 2008 at 11:33:05AM +0100, Julien Charbon wrote: > >> Hi all, > >> > >> it's seems that function calls: > >> > >> buffer = xmlEncodeEntitiesReentrant(doc, value) > >> list = xmlStringGetNodeList(doc, buffer); > >> > >> can be exactly replaced by a simple: > >> > >> list = xmlNewDocText(doc, value); > >> > >> You will find theses calls in tree.c. More precisely in > >> xmlNewPropInternal() and in xmlSetNsProp(), both called by xmlSetProp(). > >> > >> In fact all that xmlEncodeEntitiesReentrant() does, is exactly > >> undone by xmlStringGetNodeList(). There is any > >> technical/practical/historical reasons to keep these calls in tree.c? > >> > >> Below a patch that do this replacement on current trunk. [Just to > >> illustrate my concern]. Our application and libxml2's "make tests" are > >> happy with this change. > > > > I don't believe the patch is right because an attribute > > list of children can be list of text and entities references, > > and well your patch reduces it to just the case where you don't have an > > entity reference in attribute values. Even if broken parser APIs > > like SAX let people believe that attribute values can only be made > > of one text node, this is not true from the spec POV and libxml2 which > > was designed as an editing toolkit allows maintains entities references > > in attribute values. > > Thanks for your fast and clear answer, I am totally agree with it, > but... With the current implementation, and in this case: > > (1) buffer = xmlEncodeEntitiesReentrant(doc, value) > (2) list = xmlStringGetNodeList(doc, buffer); > > xmlStringGetNodeList() will always return a list with only one > XML_TEXT_NODE element because xmlEncodeEntitiesReentrant() escape all > '&' in '&'. In clear, if value is "&myent;", after (1) buffer will > be set to "&myent;" and after (2) list will contain only one > XML_TEXT_NODE element with its content set to "&myent;". > > Thus: > "&myent;" -> (1) -> "&myent;" -> (2) -> "&myent;"
argh, right .... I'm afraid the escaping has been added as an afterthought it was not supposed to be that way, oh well, one can still build the complex attrubute values 'by hand' with the help of the API, but I think somehow we defeated the initial purpose for the xmlStringGetNodeList() call > > It's give to me with current libxml2 trunk: > $ gcc test-xml-tiny.c -o test-xml-tiny $(xml2-config --cflags) \ > $(xml2-config --libs) > $ ./test-xml-tiny > Only one element in return of xmlStringGetNodeList > &foo; &bar; & <tag> &myent </tag> && > > No change. Maybe, historically, it was not always the case... Hum, yes. The only other thing that your suggested change would loose are the error message resulting from the validations occuring in xmlEncodeEntitiesReentrant() , problems reported there would go unnoticed otherwise. Is that still worth the extra complexity or not, I'm not sure. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
