Hm... Perhaps the xml-sig would be a better place to discuss this?

On 6/11/06, Thomas Broyer <[EMAIL PROTECTED]> wrote:
> Hi,
>
> First, sorry for posting this here, I closed my SourceForge account a
> few months ago and I can't get it reopened...
>
> I'm using python 2.2.1 but a diff on SVN showed that there was no
> change at this level, so the following bug should still be there in
> current versions (I'll try with a 2.4 at work tomorrow). On my
> machine, xml.sax.make_parser returns an
> xml.sax.expatreader.ExpatParser instance.
>
> The problem is: I'm never given END_DOCUMENT events.
>
> Code to reproduce:
>
> from xml.dom.pulldom import parseString
> reader = parseString('<element attribute="value">text</element>')
> # The following 2 lines will produce, in order:
> # START_DOCUMENT, START_ELEMENT, TEXT, END_ELEMENT
> # Note the lack of the END_DOCUMENT event.
> for event,node in reader:
>    print event
> # The following line will get an END_DOCUMENT event
> print reader.getEvent()[0]
> # The following line will throw a SAXParseException,
> # because the SAX parser's close method has been
> # called twice
> print reader.getEvent()[0]
>
>
> Cause:
>
> The xml.dom.pulldom.DOMEventStream.getEvent method, when it has no
> more event in its internal stack, calls the SAX parser's close()
> method (which is OK) then immediately returns 'None', ignoring any
> event that could have been generated by the call to the close()
> method. If you call getEvent later, it will send you the remaining
> events until there are no more left, and then will call the SAX
> parser's close() method again, causing a SAXParseException.
> Because expat (an maybe other parsers too) has no way to know when the
> document ends, it generates the endDocument/END_DOCUMENT event only
> when explicitely told that the XML chunk is the final one (i.e. when
> the close() method is called)
>
>
> Proposed fix:
>
> Add a "parser_closed" attribute to the DOMEventStream class,
> initialized to "False". After having called self.parser.close() in the
> xml.dom.pulldom.DOMEventStream.getEvent method, immediately set this
> "parser_closed" attribute to True and proceed. Finally, at the
> beginning of the "while" loop, immediately returns "None" if
> "parser_closed" is "True" to prevent a second call to
> self.parser.close().
> With this change, any call to getEvent when there are no event left
> will return None and never throw an exception, which I think is the
> expected behavior.
>
>
> Proposed code:
>
> The "closed" attribute is initialized in the "__init__" method:
>     def __init__(self, stream, parser, bufsize):
>         self.stream = stream
>         self.parser = parser
>         self.parser_closed = False
>         self.bufsize = bufsize
>         if not hasattr(self.parser, 'feed'):
>             self.getEvent = self._slurp
>         self.reset()
>
> The "getEvent" method becomes:
>     def getEvent(self):
>         # use IncrementalParser interface, so we get the desired
>         # pull effect
>         if not self.pulldom.firstEvent[1]:
>             self.pulldom.lastEvent = self.pulldom.firstEvent
>         while not self.pulldom.firstEvent[1]:
>             if self.parser_closed:
>                 return None
>             buf = self.stream.read(self.bufsize)
>             if buf:
>                 self.parser.feed(buf)
>             else:
>                 self.parser.close()
>                 self.parser_closed = True
>         rc = self.pulldom.firstEvent[1][0]
>         self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
>         return rc
>
> The same problem seems to exist in the
> xml.dom.pulldom.DOMEventStream._slurp method, when the SAX parser is
> not an IncrementalParser, as the parser's close() method is never
> called. I suggest adding a call to the close() method in there.
> However, as I only have expat as an option, which implements
> IncrementalParser, I can't test it...
> The _slurp method would become:
>     def _slurp(self):
>         """ Fallback replacement for getEvent() using the
>             standard SAX2 interface, which means we slurp the
>             SAX events into memory (no performance gain, but
>             we are compatible to all SAX parsers).
>         """
>         self.parser.parse(self.stream)
>         self.parser.close()
>         self.getEvent = self._emit
>         return self._emit()
> The _emit method raises exceptions when there are no events left, so I
> propose changing it to:
>     def _emit(self):
>         """ Fallback replacement for getEvent() that emits
>             the events that _slurp() read previously.
>         """
>         if not self.pulldom.firstEvent[1]:
>             return None
>         rc = self.pulldom.firstEvent[1][0]
>         self.pulldom.firstEvent[1] = self.pulldom.firstEvent[1][1]
>         return rc
>
> Hope this helps.
>
> --
> Thomas Broyer
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to