Daniel Veillard wrote:
> On Fri, Jan 25, 2008 at 11:33:05AM +0100, Julien Charbon wrote:
>> Hi all,
>>
>> it's seems that function calls:
>>
>> buffer = xmlEncodeEntitiesReentrant(doc, value)
>> list = xmlStringGetNodeList(doc, buffer);
>>
>> can be exactly replaced by a simple:
>>
>> list = xmlNewDocText(doc, value);
>>
>> You will find theses calls in tree.c. More precisely in
>> xmlNewPropInternal() and in xmlSetNsProp(), both called by xmlSetProp().
>>
>> In fact all that xmlEncodeEntitiesReentrant() does, is exactly
>> undone by xmlStringGetNodeList(). There is any
>> technical/practical/historical reasons to keep these calls in tree.c?
>>
>> Below a patch that do this replacement on current trunk. [Just to
>> illustrate my concern]. Our application and libxml2's "make tests" are
>> happy with this change.
>
> I don't believe the patch is right because an attribute
> list of children can be list of text and entities references,
> and well your patch reduces it to just the case where you don't have an
> entity reference in attribute values. Even if broken parser APIs
> like SAX let people believe that attribute values can only be made
> of one text node, this is not true from the spec POV and libxml2 which
> was designed as an editing toolkit allows maintains entities references
> in attribute values.
Thanks for your fast and clear answer, I am totally agree with it,
but... With the current implementation, and in this case:
(1) buffer = xmlEncodeEntitiesReentrant(doc, value)
(2) list = xmlStringGetNodeList(doc, buffer);
xmlStringGetNodeList() will always return a list with only one
XML_TEXT_NODE element because xmlEncodeEntitiesReentrant() escape all
'&' in '&'. In clear, if value is "&myent;", after (1) buffer will
be set to "&myent;" and after (2) list will contain only one
XML_TEXT_NODE element with its content set to "&myent;".
Thus:
"&myent;" -> (1) -> "&myent;" -> (2) -> "&myent;"
See below a tiny test that focus on that case:
== test-xml-tiny.c ==
#include <stdio.h>
#include <stdlib.h>
#include <libxml/tree.h>
int main(void) {
LIBXML_TEST_VERSION;
// Attribute value with entities and '<' '>'
xmlChar *attributeValue = "&foo; &bar; & <tag> &myent </tag> &&";
xmlChar *buffer = NULL;
xmlDocPtr doc = NULL;
doc = xmlNewDoc(BAD_CAST "1.0");
buffer = xmlEncodeEntitiesReentrant(doc, attributeValue);
xmlNodePtr nodeList = xmlStringGetNodeList(doc, buffer);
if(nodeList->next == NULL)
printf("Only one element in return of xmlStringGetNodeList\n");
printf("%s\n", nodeList->content);
return 0;
}
== test-xml-tiny.c ==
It's give to me with current libxml2 trunk:
$ gcc test-xml-tiny.c -o test-xml-tiny $(xml2-config --cflags) \
$(xml2-config --libs)
$ ./test-xml-tiny
Only one element in return of xmlStringGetNodeList
&foo; &bar; & <tag> &myent </tag> &&
No change. Maybe, historically, it was not always the case...
--
Julien
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml