Daniel, All, before it's forgotten, does anyone have some clues about this, please? Shall I buzilla it? Thanks, -- Petr
On Sunday 10 June 2007 23:10, Petr Pajas wrote: > Hi, > > I have two files (also attached) > > 1) test.xml: > <?xml version="1.0" encoding="ISO-8859-1"?> > <!DOCTYPE a [ > <!ENTITY b SYSTEM "b.txt"> > ]> > <a>&b;</a> > > 2) b.txt, which contains just "B" > > When parsing test.xml via the SAX2 interface, I get two character > callbacks for the string "B". The problem can be reproduced with > testSAX --noent from the libxml2 distribution: > > $ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml > SAX.setDocumentLocator() > SAX.startDocument() > SAX.internalSubset(a, , ) > SAX.entityDecl(b, 2, (null), b.txt, (null)) > SAX.externalSubset(a, , ) > SAX.startElement(a) > SAX.getEntity(b) > SAX.characters(B, 1) > SAX.characters(B, 1) <--- why? > SAX.endElement(a) > SAX.endDocument() > > (similarly if b.txt is complex XML - I get the same callbacks for > nodes in the entity twice) > > Is this an expected behavior? If yes, can I somehow distinguish > between the two calls (e.g. based on ctxt) so that I can filter > one of them out? > > P.S. this was observed by one of the users of the Perl bindings > for libxml2. We also have interface for libxml2's reader API in > Perl too, but there are hundreds of very popular Perl modules > build upon the SAX interface (mainly because Perl has really > advanced sax filtering and pipelining with interchangeable SAX > implementations varying from pure-perl, expat, to libxml2; > libxml2 is the fastest among them which makes it very popular and > thus worth maintaining). > > Thanks in advance, > -- Petr _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
