Bugs item #1165107, was opened at 2005-03-17 11:25 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473
Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Magnus Lie Hetland (mlh) Assigned to: Nobody/Anonymous (nobody) Summary: sgmlop drops trailing partial tokens Initial Comment: Partial entities in the middle of the text are (appropriately) reported as text by sgmlop. However, if the partial entity is placed at the end of the text, it isn't reported. This behavior would be understandble when using the feed method alone, but it also occurs with the parse method (which closes the parser after the feed), and that is unfortunate. It means (as far as I can see) that the tail of the input is simply ignored. One especially bad example is if the input contains -- or even begins with -- a stray '<' character, without later containing a '>' character. Then everything from that point on is ignored. The following snippet demonstrates the problem: from xml.parsers.sgmlop import SGMLParser, XMLParser, XMLUnicodeParser class Handler: def handle_data(self, data): print 'Data:', repr(data) for text in ['<', '{', '<foo bar < " ', '</foo', '< ', '{ ', 'frozz <foo bar < " ', 'bar </foo']: for parser in [SGMLParser(), XMLParser(), XMLUnicodeParser()]: parser.register(Handler()) print '%s with %s:' % (repr(parser), repr(text)) parser.parse(text) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=106473&aid=1165107&group_id=6473 _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig