Daniel, All,
before it's forgotten, does anyone have some clues about this, 
please? Shall I buzilla it?
Thanks,
-- Petr

On Sunday 10 June 2007 23:10, Petr Pajas wrote:
> Hi,
>
> I have two files (also attached)
>
> 1) test.xml:
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE a [
>   <!ENTITY b SYSTEM "b.txt">
> ]>
> <a>&b;</a>
>
> 2) b.txt, which contains just "B"
>
> When parsing test.xml via the SAX2 interface, I get two character
> callbacks for the string "B". The problem can be reproduced with
> testSAX --noent from the libxml2 distribution:
>
> $ /home/pajas/h2/compile/gnome-xml/testSAX --noent test.xml
> SAX.setDocumentLocator()
> SAX.startDocument()
> SAX.internalSubset(a, , )
> SAX.entityDecl(b, 2, (null), b.txt, (null))
> SAX.externalSubset(a, , )
> SAX.startElement(a)
> SAX.getEntity(b)
> SAX.characters(B, 1)
> SAX.characters(B, 1)  <--- why?
> SAX.endElement(a)
> SAX.endDocument()
>
> (similarly if b.txt is complex XML - I get the same callbacks for
> nodes in the entity twice)
>
> Is this an expected behavior? If yes, can I somehow distinguish
> between the two calls (e.g. based on ctxt) so that I can filter
> one of them out?
>
> P.S. this was observed by one of the users of the Perl bindings
> for libxml2. We also have interface for libxml2's reader API in
> Perl too, but there are hundreds of very popular Perl modules
> build upon the SAX interface (mainly because Perl has really
> advanced sax filtering and pipelining with interchangeable SAX
> implementations varying from pure-perl, expat, to libxml2;
> libxml2 is the fastest among them which makes it very popular and
> thus worth maintaining).
>
> Thanks in advance,
> -- Petr
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to