[xml] sax and entities

Petr Pajas Sun, 10 Jun 2007 14:08:37 -0700

Hi,

I have two files (also attached)


1) test.xml:
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE a [
  <!ENTITY b SYSTEM "b.txt">
]>
<a>&b;</a>

2) b.txt, which contains just "B"

When parsing test.xml via the SAX2 interface, I get two character callbacks 
for the string "B". The problem can be reproduced with testSAX --noent from 
the libxml2 distribution:

$ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml
SAX.setDocumentLocator()
SAX.startDocument()
SAX.internalSubset(a, , )
SAX.entityDecl(b, 2, (null), b.txt, (null))
SAX.externalSubset(a, , )
SAX.startElement(a)
SAX.getEntity(b)
SAX.characters(B, 1)
SAX.characters(B, 1)  <--- why?
SAX.endElement(a)
SAX.endDocument()

(similarly if b.txt is complex XML - I get the same callbacks for nodes in the 
entity twice)

Is this an expected behavior? If yes, can I somehow distinguish between the 
two calls (e.g. based on ctxt) so that I can filter one of them out?

P.S. this was observed by one of the users of the Perl bindings for libxml2. 
We also have interface for libxml2's reader API in Perl too, but there are 
hundreds of very popular Perl modules build upon the SAX interface (mainly 
because Perl has really advanced sax filtering and pipelining with 
interchangeable SAX implementations varying from pure-perl, expat, to 
libxml2; libxml2 is the fastest among them which makes it very popular and 
thus worth maintaining).

Thanks in advance,
-- Petr

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE a [
  <!ENTITY b SYSTEM "b.txt">
]>
<a>&b;</a>

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

[xml] sax and entities

Reply via email to