Daniel Veillard wrote:
> On Fri, Jan 25, 2008 at 11:33:05AM +0100, Julien Charbon wrote:
>>   Hi all,
>>
>> it's seems that function calls:
>>
>>   buffer = xmlEncodeEntitiesReentrant(doc, value)
>>   list = xmlStringGetNodeList(doc, buffer);
>>
>> can be exactly replaced by a simple:
>>
>>   list = xmlNewDocText(doc, value);
>>
>>   You will find theses calls in tree.c. More precisely  in 
>> xmlNewPropInternal() and in xmlSetNsProp(), both called by xmlSetProp().
>>
>>   In fact all that xmlEncodeEntitiesReentrant() does, is exactly 
>> undone by xmlStringGetNodeList(). There is any 
>> technical/practical/historical reasons to keep these calls in tree.c?
>>
>>   Below a patch that do this replacement on current trunk. [Just  to 
>> illustrate my concern]. Our application and libxml2's "make tests" are 
>> happy with this change.
> 
>   I don't believe the patch is right because an attribute 
> list of children can be list of text and entities references,
> and well your patch reduces it to just the case where you don't have an
> entity reference in attribute values. Even if broken parser APIs
> like SAX let people believe that attribute values can only be made
> of one text node, this is not true from the spec POV and libxml2 which
> was designed as an editing toolkit allows maintains entities references
> in attribute values.

  Thanks for your fast and clear answer, I am totally agree with it, 
but... With the current implementation, and in this case:

(1) buffer = xmlEncodeEntitiesReentrant(doc, value)
(2) list = xmlStringGetNodeList(doc, buffer);

xmlStringGetNodeList() will always return a list with only one 
XML_TEXT_NODE element because xmlEncodeEntitiesReentrant() escape all 
'&' in '&'. In clear, if value is "&myent;", after (1) buffer will 
be set to "&myent;" and after (2) list will contain only one 
XML_TEXT_NODE element with its content set to "&myent;".

Thus:
"&myent;" -> (1) -> "&myent;" -> (2) -> "&myent;"

See below a tiny test that focus on that case:

== test-xml-tiny.c ==
#include <stdio.h>
#include <stdlib.h>

#include <libxml/tree.h>

int main(void) {

   LIBXML_TEST_VERSION;

   // Attribute value with entities and '<' '>'
   xmlChar *attributeValue = "&foo; &bar; &amp; <tag> &myent </tag> &&";
   xmlChar *buffer = NULL;
   xmlDocPtr doc = NULL;

   doc = xmlNewDoc(BAD_CAST "1.0");

   buffer = xmlEncodeEntitiesReentrant(doc, attributeValue);
   xmlNodePtr nodeList = xmlStringGetNodeList(doc, buffer);

   if(nodeList->next == NULL)
     printf("Only one element in return of xmlStringGetNodeList\n");

   printf("%s\n", nodeList->content);

   return 0;
}
== test-xml-tiny.c ==

It's give to me with current libxml2 trunk:
$ gcc test-xml-tiny.c -o test-xml-tiny $(xml2-config --cflags) \
   $(xml2-config --libs)
$ ./test-xml-tiny
Only one element in return of xmlStringGetNodeList
&foo; &bar; &amp; <tag> &myent </tag> &&

  No change. Maybe, historically, it was not always the case...

--
Julien
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to