[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Yannick Duchêne
Yannick Duchêne added the comment: The documentation says: > Reset the instance. Loses all unprocessed data. How can parsing go ahead with all unprocessed data lost? This is the “Loses all unprocessed data” which made me believe it is to stop it. May be the documentation is unclear. By the

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Yannick Duchêne
Yannick Duchêne added the comment: Thanks Xiang, for the clear explanations. So an error should be triggered when `reset` is invoked while it should not. And remains the issue about how to stop the parser: should an exception be raised and caught at an outer invocation level? Something like

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Xiang Zhang
Xiang Zhang added the comment: Hmm, I don't know whether I am right or not. Let's wait for a core member to clarify. If I am wrong, I am quite sorry. I don't think invoking reset when parsing should raise an error(and I don't know how to achieve that). When to invoke a subroutine is

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Yannick Duchêne
Yannick Duchêne added the comment: > And I don't see how to stop the process either. I just did it with `raise StopIteration`, caught at a proper place (in the procedure which invokes `feed` and `close`), and it seems to be fine, I have no more strange behaviours. At least, I cannot see a

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Yannick Duchêne
Changes by Yannick Duchêne : -- nosy: +ezio.melotti ___ Python tracker ___ ___

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-27 Thread Xiang Zhang
Xiang Zhang added the comment: Actually it does move forward since in goahead, it first store a "copy" of the initial self.rawdata and use it to control the flow. If you make some change to self.rawdata when parsing, for example call reset, goahead can not feel it. But methods parse_* can. So

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-26 Thread Yannick Duchêne
New submission from Yannick Duchêne: `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked. This occurs at least when `HTMLParser.reset` was invoked during `HTMLParser.handle_endtag`. According to the documentation, `HTMLParser.reset` discard all data, so it should

[issue26210] `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked

2016-01-26 Thread Xiang Zhang
Xiang Zhang added the comment: reset just set some attributes to the initial states and it does not control the parsing process. So reading the gohead function, even if reset is called in handle_endtag and all data are discarded, it is still possible for the process to move forward.