I am not sure if I have located a bug or not....
Using Python (2.4) and libxml2.2.6.22
When I load an document containing an entity, if I attempt to read
the value of a node containing an entity, I get the text content and
the entity disappears.
In the following example, when looking at root.content I would expect
to see '©2007', instead all I get is '2007'.
I was advised on the #XML IRC channel to construct a simple test
case, so here it is:
File 1: test.xml
<?xml version="1.0"?>
<!DOCTYPE content [
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin 1 for XHTML//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
%HTMLlat1;
]>
<content>
<p>©2007</p>
</content>
File 2: testcase.py
import libxml2
sourcedoc = libxml2.parseFile( 'test.xml' )
root = sourcedoc.getRootElement()
print root.serialize()
print root.content
Reading the source for libxml2.py, I find the following:
def getContent(self):
"""Read the value of a node, this can be either the text
carried directly by this node if it's a TEXT node or the
aggregate string of the values carried by this node
child's (TEXT and ENTITY_REF). Entity references are
substituted. """
ret = libxml2mod.xmlNodeGetContent(self._o)
return ret
Which in my (admittedly limited) understanding I would have thought
would return the translated entity as well as the text when I examine
root.content.
Is this a bug, or am I doing something wrong?
Cheers
Mike
[EMAIL PROTECTED]
http://www.mikekneller.com
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml