On Sat, Jun 16, 2007 at 08:48:02AM +0200, Petr Pajas wrote: > Daniel, All, > before it's forgotten, does anyone have some clues about this,
You must implement an entity handler as part of the SAX callback block which is compatible with libxml2 entities processing and your own needs. > please? Shall I buzilla it? > Thanks, > -- Petr > > On Sunday 10 June 2007 23:10, Petr Pajas wrote: > > Hi, > > > > I have two files (also attached) > > > > 1) test.xml: > > <?xml version="1.0" encoding="ISO-8859-1"?> > > <!DOCTYPE a [ > > <!ENTITY b SYSTEM "b.txt"> > > ]> > > <a>&b;</a> > > > > 2) b.txt, which contains just "B" > > > > When parsing test.xml via the SAX2 interface, I get two character > > callbacks for the string "B". The problem can be reproduced with > > testSAX --noent from the libxml2 distribution: > > > > $ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml > > SAX.setDocumentLocator() > > SAX.startDocument() > > SAX.internalSubset(a, , ) > > SAX.entityDecl(b, 2, (null), b.txt, (null)) > > SAX.externalSubset(a, , ) > > SAX.startElement(a) > > SAX.getEntity(b) > > SAX.characters(B, 1) > > SAX.characters(B, 1) <--- why? One when parsing the entity to make sure it's well formed the first time you use the entity. One each time the entity must be delivered to user land. > > SAX.endElement(a) > > SAX.endDocument() > > > > (similarly if b.txt is complex XML - I get the same callbacks for > > nodes in the entity twice) > > > > Is this an expected behavior? If yes, can I somehow distinguish > > between the two calls (e.g. based on ctxt) so that I can filter > > one of them out? > > > > P.S. this was observed by one of the users of the Perl bindings > > for libxml2. We also have interface for libxml2's reader API in > > Perl too, but there are hundreds of very popular Perl modules > > build upon the SAX interface (mainly because Perl has really > > advanced sax filtering and pipelining with interchangeable SAX > > implementations varying from pure-perl, expat, to libxml2; > > libxml2 is the fastest among them which makes it very popular and > > thus worth maintaining). it's all dependant on how your entity handler is implemented I think. It's very tricky, I agree, that's why I suggest to not use SAX in general. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ [EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
