[issue17741] event-driven XML parser

2014-03-11 Thread Stefan Behnel

Stefan Behnel added the comment:

My latest status is that a decision on the future of the parser argument is 
still pending. See #20219.

It's correctly deprecated in the sense that passing any previously existing 
parser isn't going to be supported anymore, but passing an XMLPullParser (which 
is new in Py3.4 and thus can't have been used in existing code) now makes 
perfect sense (IMHO). So, in a way, it might end up as a sort of argument 
recycling rather than argument deletion.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2014-03-11 Thread Nick Coghlan

Nick Coghlan added the comment:

Yeah, I'd agree with Stefan here - documented deprecation is reasonable at this 
point (so that new users avoid it for now), but we may still undeprecate it 
later if we decide it makes sense to do so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2014-03-11 Thread R. David Murray

R. David Murray added the comment:

I think the current status of whatsnew is OK for this, then.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2014-03-11 Thread R. David Murray

R. David Murray added the comment:

Someone should add programatic deprecation for the html argument, though, since 
people need to switch to keyword-based calls to XMLParser to prepare for that 
going away.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2014-03-10 Thread R. David Murray

R. David Murray added the comment:

As far as I can tell from scanning this ticket, there was no agreement on 
deprecating the parser argument to iterparse, but it has been documented as 
deprecated with no documentation about what to replace it with, nor any 
DeprecationWarning in the code that I can see.  Should I remove the deprecation 
documentation and someone can figure out what to do about it in 3.5, or was the 
decision just not recorded here (or I missed it)?  (It doesn't look like 
there's a programatic deprecation for the html argument either.)

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2014-03-10 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 31e6adf5bfba by R David Murray in branch 'default':
whatsnew: deprecation of ElementTree XMLParser *html* and iterparse *parser*.
http://hg.python.org/cpython/rev/31e6adf5bfba

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-09-13 Thread Stefan Behnel

Stefan Behnel added the comment:

The way the XMLPullParser is implemented in lxml.etree now is that it simply 
inherits from XMLParser. This would also make sense for ElementTree, even 
before supporting arbitrary targets. The patch in ticket #18990 makes this 
simple to do.

For reference, here is the implementation in lxml.

Iterparse:

https://github.com/lxml/lxml/blob/master/src/lxml/iterparse.pxi

XMLPullParser:

https://github.com/lxml/lxml/blob/d9f7cd8d12a27cafc4d65c6e280ea36156e3b837/src/lxml/parser.pxi#L1357

SAX based parser that collects events and/or maps parser callbacks to callbacks 
on the target object:

https://github.com/lxml/lxml/blob/master/src/lxml/saxparser.pxi

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-09-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Looks like we missed the alpha2 release for the close() API fix. I recommend 
not letting yet another deadline go by.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-09-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Created separate issue18990 to keep this one closed as is.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-09-08 Thread Stefan Behnel

Stefan Behnel added the comment:

While refactoring the iterparse() implementation in lxml to support this new 
interface, I noticed that the close() method of the XMLPullParser does not 
behave like the close() method of the XMLParser. Instead of setting some .root 
attribute on the parser instance, the method should return the root element 
that the tree builder generated.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-09-01 Thread Eli Bendersky

Eli Bendersky added the comment:

I'm closing this issue as fixed because the feature was implemented. As 
discussed, opened a new issue (#18902) to discuss re-designing some parts of 
the current code that would allow a nicer implementation of this feature. 
Anyone interested in the subject - feel free to subscribe yourself there.

--
resolution:  - fixed
stage: needs patch - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-31 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 1faaec66c73d by Eli Bendersky in branch 'default':
Fix XMLPullParser documentation to say non-blocking instead of asynchronous.
http://hg.python.org/cpython/rev/1faaec66c73d

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-30 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 8fd72b1bb262 by Eli Bendersky in branch 'default':
Issue #17741: Rename IncrementalParser and its methods.
http://hg.python.org/cpython/rev/8fd72b1bb262

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-30 Thread Eli Bendersky

Eli Bendersky added the comment:

This issue has become too large and discusses a few different things; hence I'm 
inclined to close it as fixed and open a new one as a placeholder for 
discussing the new design of the internals.

I'll do this in a couple of days if there are no objections.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-28 Thread Eli Bendersky

Eli Bendersky added the comment:

This patch implements the renaming and updates the documentation.

--
Added file: http://bugs.python.org/file31497/issue17741.api-renames.1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-28 Thread Stefan Behnel

Stefan Behnel added the comment:

Any comments regarding my naming suggestion?

Calling it a push parser is just too ambiguous.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-28 Thread Stefan Behnel

Stefan Behnel added the comment:

Erm, pull parser, but you see what I mean.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Stefan Behnel

Stefan Behnel added the comment:

 Putting _setevents aside for the moment,

Agreed, obviously.


 XMLParser is a clean and simple API. Its output is only push (by calling 
 callbacks on the target). It doesn't deal with Elements at all.

We already agreed on that, too. Keep things simple.


 It decomposes the input XML and produces push events. That's it. It doesn't 
 even have to interact with the target to a larger extent than pushing 
 callback calls to it.

It does, though. That's essentially how iterparse works. I.e. the API between 
parser and target that I'm using here is already there. And has been for years. 
And I think that the feature in question shows that this was a particularly 
good design. Thank you Fredrik!


 You propose to add a completely different mode of operation to it, controlled 
 by a constructor flag. In collect_events mode, XMLParser undergoes a 
 transformation.

We clearly disagree here.


 Coupled with weird targets, its semantics become totally unclear.

Try to come up with a patch for your own proposal that avoids that. You will 
notice that you will be using the existing protocol, thus the target will 
continue to be able to do what it wants. If it's a weird target, then it's a 
weird target because the user said so. But it's the target's very own concern 
to do expected things (build a tree) or do weird things (whatever an example 
for this is).

Except when you disallow using user defined targets, but then, that's an 
arbitrary restriction that ignores what is there already.

As long as you want to keep up the existing separation of concern between 
parser and target (and we agreed on that being a good thing), and as long as 
you don't impose arbitrary (and arbitrarily complex) restrictions on user code, 
you will not be able to prevent the target from doing weird things that 
impact the events being collected by whatever interface you choose for this. 
And why should you?


 XMLParser is not supposed to be an isolated entity. Users should be able to 
 implement custom parsers that provide its API. When the API is simple - 
 feed/close and some callbacks, replacing/ehnancing/faking it is easy. When 
 the API is complex -- two different modes of operation -- it's considerably 
 more difficult.

That's only true when you use it for incremental event parsing. If you don't, 
that part of the API doesn't have to bother you. If you want to implement a 
custom parser that doesn't support event collection, then simply don't support 
the collect_events keyword argument and you're done. Implement read_events() 
by returning an empty iterator, if you like, that's up to you. If you want to 
write a target that prevents parsers from keeping tree fragments alive in 
memory, simply return None from your callback methods and you're done. If you 
want to write a target that changes the Elements that the parser collects, then 
you can do that. Yes, those are actually features. And you could already do 
most of this before.

Remember that it's the users using the parser and providing (or implementing) 
the target. We should let them.


I'm sorry if I sound offensive, but most of what I read as counter arguments in 
this ticket so far was either incomplete or (sometimes) even plain wrong. Some 
seemed pretty close to FUD, but maybe that's just me (and I'm guilty here too, 
I guess - sorry for that). Now, how could this discussion possibly give the me 
impression that everyone has the same level of understanding? Even your comment 
about weird targets, right above, after going through all of this discussion, 
makes me wonder if you really know how all of this works.

Please take a closer look at the original code before Antoine changed it. Play 
with the parser targets a bit. That will help you understand this issue better.


Also, if you want to come up with a patch that implements approach 3), i.e. a 
class between parser and target, then please do. I tried it (the incomplete and 
partially fake patch is attached to this ticket), and my take on it is that 
it's too complex and requires some rather non-trivial changes, including user 
visible ones. So, based on that experience, I decided that it would be better 
to keep things simple and integrate the feature more closely with what is there 
anyway. I think it might help if you went through the same experience.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Nick Coghlan

Nick Coghlan added the comment:

The whole point of the new API is not to replace XMLParser, but to
provide a convenience API to set up a particular combination of an
XMLParser with a particular kind of custom target. It just happens
that actually *implementing it* that way doesn't work right with the
current code. It is *not* necessary (or even appropriate) for it to
serve as a new building block, but as a convenience API to compose
*existing* building blocks in a particular way (or should be, except
that composition doesn't currently work right).

Once the underlying deficiencies are fixed, then the event capturing
target can be exposed directly, but in the meantime, the *composed*
version can be exposed by relying on internal APIs.

We could call the new class XMLParserWithEventCapturingTarget, but
that's a little clumsy :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Stefan Behnel

Stefan Behnel added the comment:

 The whole point of the new API is not to replace XMLParser, but to provide a 
 convenience API to set up a particular combination of an XMLParser with a 
 particular kind of custom target.

Ok, but I'm saying that we don't need that. It's all there, it all comes 
together at the interfaces of the parser. The two-way communication between 
parser and target already exists and is used by iterparse().

So, what I'm advocating is that we should not complicate the module interface 
with a new class (and eventually two classes) at all. Instead, we should just 
expose the *existing* interface of the parser in two ways, once as callbacks 
and once as collected events. I find that much easier to explain than any of 
the other proposals I've seen so far.

This is really not about doing it this way because it's technically too 
difficult to do differently. It's about keeping things simple for users and 
well integrated with what's there.

BTW, Eli asked for working code before we discuss. I've provided it. Now I 
would like to see working code for the proposed three-level design in order to 
make sure it actually works and feels right. I've already said that I've tried 
it and it didn't feel right. We can continue to discuss hot air for another 
month or two, or we can stick to discussing real code and real APIs.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Eli Bendersky

Eli Bendersky added the comment:

I'm sorry if I sound offensive, but most of what I read as counter
 arguments in this ticket so far was either incomplete or (sometimes) even
 plain wrong. Some seemed pretty close to FUD, but maybe that's just me (and
 I'm guilty here too, I guess - sorry for that). Now, how could this
 discussion possibly give the me impression that everyone has the same level
 of understanding? Even your comment about weird targets, right above,
 after going through all of this discussion, makes me wonder if you really
 know how all of this works.

 Please take a closer look at the original code before Antoine changed it.
 Play with the parser targets a bit. That will help you understand this
 issue better.


Ah, awesome. Well, you can't say I didn't warn you, can you?

I am done discussing this issue with you, Stefan.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Stefan Behnel

Stefan Behnel added the comment:

Eli, I agree that we've put way more than enough time into the discussion by 
now. We all know each other's arguments and failed to convince each other.

Please come up with working code that shows that the approach you are 
advocating for the final implementation is doable and also helpful from an API 
user's point of view.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Nick Coghlan

Nick Coghlan added the comment:

Stefan, your proposed merged design isn't going to happen. Two alternative
ways of using the one class is far more complicated to explain to users
than exposing a helper that composes the existing API components to address
a particular use case (particularly when the helper is hiding the fact that
the theoretical underlying composition doesn't currently work the way it
should according to the public APIs).

As a higher level helper, users that are comfortable with and prefer the
lower level API are then clearly free to ignore. This recasting of the
nature of the API as a composition of the lower level components is a
*direct* response to your concern about it being a capability that the
lower level API should support directly.

You clearly don't like that design choice, but this isn't a democracy.
Arguing further on this point will just be ranting for the sake of ranting,
rather than having any chance of changing Eli's mind as module maintainer.

As far as naming goes, I still think it's better to drop Parser from the
name entirely. Call it XMLEventReader or something like that. The point
is to use it when you want to get the events out without blocking when no
events have been triggered and don't want to set up your own class to
receive event notifications through the target push API. The docs should
make it clear that this API is pull only - if people need callbacks or
other push triggers, then that's still what custom targets are for.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Nick Coghlan

Nick Coghlan added the comment:

Sorry, Stefan, I missed your last comment before posting mine. It appears
you had already reached the same conclusion I had regarding further high
level design discussion being pointless :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Eli Bendersky

Eli Bendersky added the comment:

W.r.t. the summary in http://bugs.python.org/msg196177, after some further chat 
with Nick, a name that sounds good for us for the class is XMLPullParser. It 
has both XML and Parser in it, so the intention is clear. On the other hand, 
Pull works well for analogy with the existing pull API in xml.dom.pulldom, 
*and* suggests that the push API is inactive and gets redirected to a pull API 
(thanks Nick for this insight). 

The method names remain as suggested there.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-27 Thread Stefan Behnel

Stefan Behnel added the comment:

 the push API is inactive and gets redirected to a pull API

Given that this is aimed to become a redundant 'convenience' wrapper around 
something else at a point, I assume that you are aware that the above is just 
an arbitrary restriction due to the fact that the parser does not accept 
arbitrary targets as input.

Also, the refrence to pulldom is a bad one, because pulldom reads data as you 
request it (pull). This class creates events based on the data you *push* 
into it. So it's really a push API from the POV of the user, not so much a pull 
API.

Anyway, it should be possible to emulate both this class and the pie-in-the-sky 
three-level API based on what I suggested, so I can't see this addition being a 
problem. I may even end up making the same visual split in lxml.etree 
deliberately, in order to make it clear that this feature only applies to the 
push parser API, i.e. when using feed() and close(), as opposed to what parse() 
and fromstring() use. That would mean that there'd also be a corresponding 
class for the HTMLParser then, and both would inherit from the base parsers and 
thus obviously (and conveniently) accept arbitrary parser targets.

I think I'm ok with adding that class. I liked Greg Ewing's suggestion of 
calling it AsyncParser, though, i.e. AsyncXMLParser. It makes it obvious what 
the use case is and avoids any references to push or pull that will always 
confuse some users, based on what part of the feature they are focussing on in 
their code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

BTW, I also like how short and clean iterparse() becomes when you move this 
feature into the parser. It's basically just a convenience function that does 
read(), feed(), and yield-from. Plus the usual bit of bolerplate code, 
obviously.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Nick Coghlan

Nick Coghlan added the comment:

Eli's summary left out an exchange between us that happened after he'd already 
written the summary - he pointed out the same problem with the EventParser name 
that you noticed: it's really an alternative XMLParser that exposes 
read_events(), rather than an event parser. So a completely different name like 
EventScanner or EventStream may be more appropriate. (I quite like 
EventStream, since it further suggests the destructive nature of read_events()).

As for why we think it's worth keeping this as a separate API, it's really 
about turning XMLParser's push API for events (where the events are pushed 
into the target object by the parser calling the appropriate methods), into an 
iterparse style pull API where the events can be retrieved via calls to 
read_events().

The back end implementation for the event streaming API *should* be a custom 
target object combined with a regular XMLParser object, but there are 
implementation issues currently preventing that.

We also discussed adding the event streaming interface directly to XMLParser, 
but I agreed with Eli that it's worth keeping that base API simple, especially 
since in the long run we want the new class to just be a convenience API for 
combining XMLParser and a custom target object, even if it can't be implemented 
that way right now.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 iterparse's parser argument will be deprecated

No need to do that. Update the docs, yes, but otherwise keep the possibility to 
improve the implementation later on, without going through a deprecation + 
dedeprecation cycle. That would just confuse users IMHO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Nick Coghlan

Nick Coghlan added the comment:

Since parsers don't support changing the target after creation, I think it 
makes sense to deprecate passing in a parser *instance*, and instead require 
passing in a callback that accepts the target to use and *returns* an 
appropriate parser object. The parser can only use the default TreeBuilder as 
a target. line in the iterparse docs is a sign that this is a more appropriate 
API - if iterparse needs a particular kind of target, the API should be 
designed so iterparse *provides* that target instead of saying please don't 
use a custom target type.

However, I don't think it makes sense to provide such a callback based API 
until it is actually possible to implement the streaming event API using a 
custom target object with XMLParser.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

I don't see adding one method to XMLParser as a design problem. In fact, it's 
even a good design on the technical side, because if ET ever gains an 
HTMLParser, then the implementation of this feature would be highly dependent 
on the underlying parser, i.e. it would be very different from the 
implementation in XMLParser. It might not even be supported at all. Moving the 
API into the parser gives us that choice quite naturally.

It also solves the naming issue. Why add another thing to the module that needs 
explanation (and even an explanation why it's there and how it's different from 
what else is there), when we can just add the feature to what's there and be 
done?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 ... instead require passing in a callback that accepts the target ...

That could be the parser class then, for example, except that there may be 
other options to set as well. Plus, it would not actually allow iterparse to 
wrap a user provided target. So, in fact, that approach doesn't match with the 
design of introducing a target wrapper.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 it's really about turning XMLParser's push API for events (where the events 
 are pushed into the target object by the parser calling the appropriate 
 methods), into an iterparse style pull API where the events can be retrieved 
 via calls to read_events().

Sorry, but this is ignoring the fact that the target is an *inherent* part of 
this feature. In fact, the push API of XMLParser forms an integral part of the 
design. The events that are being collected are a mixture of what the XMLParser 
produces and what the target produces. You can't think this feature without 
these two parts.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 in the long run we want the new class to just be a convenience API for 
 combining XMLParser and a custom target object, even if it can't be 
 implemented that way right now.

Just to be clear: I changed my opinion on this one and I no longer think that 
it is a good idea. The negative impact on what's there currently is just too 
large, specifically the user visible deprecation and the change to the target 
object API.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Eli Bendersky

Eli Bendersky added the comment:

On Sun, Aug 25, 2013 at 10:40 PM, Stefan Behnel rep...@bugs.python.orgwrote:


 Stefan Behnel added the comment:

 Hmm, did you look at my last comment at all? It solves both the technical
 issues and the API issues very nicely and avoids any problems of potential
 future changes. Let me quickly explain why.

 The feature in question depends on two existing parts of the API: the
 event generation of the parser, and the return values of the parser target
 (e.g. a tree builder). So there are really only three places where this
 feature makes sense, both technically and API-wise.

 1) in the target
 2) in the parser
 3) between parser and target

 Note how a separate class is ruled out right from the start by the fact
 that the feature lives somehwere between parser and target. It's an
 inherent part of the existing design already (and of the implementation,
 BTW), so I don't see how adding a separate thing to control it makes any
 sense.

 1) is impossible because the target is user provided and we do not control
 it
 2) works fine because the parser controls both the call to the target and
 its return value
 3) would be nice (and was my original favourite) but is hard to do with
 the current implementation and requires further changes to the API of
 parser targets

 So 2) is the choice that remains.


I think folding it all into XMLParser is a bad idea. XMLParser is a fairly
simple API and I don't want to complicate it. But more importantly,
XMLParser knows nothing about Elements, at least in the direct API of
today. The one constructing Elements is the target. The read_events
method proposed for the new class (currently IncrementalParser.events)
already returns Elements, having used a TreeBuilder to build them.
XMLParser emits start/end/data calls into the target, but these only carry
tag names, attributes and chunks of data. The hierarchical element
construction is done by TreeBuilder.

What I actually think would be better for the long term is to add new
target invocations in XMLParser - start-ns and end-ns. So XMLParser would
just keep *parsing*, leaving the interpretation of the parsed data to the
target. Today's TreeBuilder is free to ignore these calls. A custom
EventCollectingTreeBuilder can collect an event list, having all the
information at its disposal. Thus, XMLParser would remain what it is today
(minus the _setevents hack) - a router for pyexpat events.

These discussions of the future API are interesting, but what's more
important today is to have an API for IncrementalParser (using this name
before a new one is agreed upon) that doesn't block future implementation
changes. And I believe the API proposed here fits the bill.

  The class will be named EventParser.

 Obviously because it's parsing Events, as opposed to the XMLParser, which
 parses XML, or the HTMLParser, which parses HTML, right?


The name is not perfect, and proposals for a better one are welcome. FWIW,
since it already lives in the xml.etree namespace, XML does not
necessarily have to be part of the name. So, some alternatives:

* EventStreamer - proposed by Nick. I have to admit I don't feel good with
it, because I still want to be crystal clear it's a *parser* we're talking
about.
* EventBasedParser
* EventCollectingParser
* NonblockingParser
* ... other ideas?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

 XMLParser knows nothing about Elements, at least in the direct API of today. 
 The one constructing Elements is the target.

Absolutely. And I'm 100% for keeping that distinction exactly as it is.


 The read_events method proposed for the new class (currently 
 IncrementalParser.events) already returns Elements, having used a TreeBuilder 
 to build them.

More precisely, it only returns Elements *iff* the TreeBuilder builds them. If 
it does not, then it returns something else. By moving the desired 
functionality into the parser, we don't even need to change anything about the 
interface between the parser and the target object. We still can, though, if 
you want to extend the interface with start-ns and end-ns events (and I'm ok 
with that, but it's a different feature). We do not loose that option. But the 
cool thing is that we don't have to do this now, and that iterparse just keeps 
working as it is and can be fixed later. No deprecation needed.

So we can easily agree on the goals of keeping the interface of the XMLParser 
simple and not teaching it about Elements. But we still disagree about the 
conclusions. My conclusion is that the API is substantially simpler if we do 
*not* add an entire new class that just duplicates existing APIs, but keep the 
parser as the thing that generates parse events. Be they in the form of 
callbacks or in the form of event tuples (that have the same name as the 
callbacks, BTW). The cool feature is that you can use either of the two 
interfaces or even hook into one to control the other (once the C parser is 
fixed), without having to learn the distinction between an XMLParser and a 
WhateverNewParser that also just parses XML.


 I still want to be crystal clear it's a *parser* we're talking about

You have to decide what you want. IMHO, there is no use in putting a new parser 
next to the existing XMLParser if both are there for parsing XML. That is just 
unnecessarily confusing. If you want it to be a parser, use the XMLParser.


I guess there's no other way to convince you than by coding up my proposal. It 
seems to be hard to properly explain it without seeing it at work.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

Here is a proof-of-concept patch that integrates the functionality of the 
IncrementalParser into the XMLParser. I ended up reusing most of Antoines 
implementation and test suite. In case he'll look back into this ticket at some 
point, I'll put a thank you here.

The patch certainly needs more cleanup, but it shows where things are going and 
passes all tests.

--
Added file: 
http://bugs.python.org/file31475/integrate_IncrementalParser_into_XMLParser.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

(I still wonder why I'm the one writing all the patches here when Eli is the 
one who actually wants this feature ...)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Stefan Behnel

Stefan Behnel added the comment:

BTW, maybe read_events() still isn't the ideal method name to put on a parser.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Eli Bendersky

Eli Bendersky added the comment:


 (I still wonder why I'm the one writing all the patches here when Eli is
 the one who actually wants this feature ...)


Well, obviously because you're the only real programmer around here and the
rest of us are just a bunch of hand-wavy morons.

Seriously, Stefan, lose the attitude. Your unpleasant approach to
conducting a conversation completely obscures your technical contributions
and capabilities. Antoine got fed up a while ago, and I'm also nearing that
point.

With that off my chest, I will look at this new patch, hopefully tomorrow.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Ethan Furman

Changes by Ethan Furman et...@stoneleaf.us:


--
nosy: +ethan.furman

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Ethan Furman

Ethan Furman added the comment:

Stefan Behnel wrote:
 ... I still wonder why I'm the one writing all the patches ...

I imagine for the same reasons that I offered to write Enum:  I had definite 
ideas about how it should be, it is sometimes easier to explain with working 
code than with prose, and I wanted to contribute to Python.

I have to agree with Eli, though.  Wading through the accusations to get to the 
ideas is not helping your cause.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-26 Thread Eli Bendersky

Eli Bendersky added the comment:

On Mon, Aug 26, 2013 at 12:35 PM, Stefan Behnel rep...@bugs.python.orgwrote:


 Stefan Behnel added the comment:

 Here is a proof-of-concept patch that integrates the functionality of the
 IncrementalParser into the XMLParser. I ended up reusing most of Antoines
 implementation and test suite. In case he'll look back into this ticket at
 some point, I'll put a thank you here.

 The patch certainly needs more cleanup, but it shows where things are
 going and passes all tests.


OK, I looked at the patch. So I think I understood your verbal proposal
correctly earlier. And as I said then, I don't like it. I believe Nick has
summarized nicely why heaping everything into XMLParser is a bad idea, but
since you went the extra mile and made a patch, I will try to provide a
more expanded rationale.

Putting _setevents aside for the moment, XMLParser is a clean and simple
API. Its output is only push (by calling callbacks on the target). It
doesn't deal with Elements at all. It decomposes the input XML and produces
push events. That's it. It doesn't even have to interact with the target to
a larger extent than pushing callback calls to it.

You propose to add a completely different mode of operation to it,
controlled by a constructor flag. In collect_events mode, XMLParser
undergoes a transformation. It becomes a pull API. It starts interacting
with the target more deeply by using it to produce elements for events.
Coupled with weird targets, its semantics become totally unclear.

So it's a single class that acts completely differently based on an
argument. This is very unpythonic. But things get worse. XMLParser is not
supposed to be an isolated entity. Users should be able to implement custom
parsers that provide its API. When the API is simple - feed/close and some
callbacks, replacing/ehnancing/faking it is easy. When the API is complex
-- two different modes of operation -- it's considerably more difficult.

That's the gist of it. Sticking to the current design of separation of
concerns between the parser and the target/treebuilder is IMO the better
way looking forward. In the mean-time, a helper class for asynchronous
parsing is useful and timely. By restricting its API, changing its
underlying implementation will be easy in the future.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:


1. Why have the event builder wrap a tree builder? Can't it just be a 
separate target?


You need a TreeBuilder in order to build the tree for the events.

If you want to use a different target than a TreeBuilder, you're free to do so. 
That's the idea of wrapping.



2. Instead of the stock XMLParser recognizing the event builder via isinstance 
and giving it special treatment, would it not be better to still have a 
separate class that implements the XMLParser interface (it can derive from it 
or just use duck typing)?


Why would you want to restrict it to an XMLParser? If there was an HTMLParser 
(which exists in lxml.etree), then how would you make the two interact?

The isinstance() check was only a quick way to get the API working without 
changing the implementation in the back too much. The actual functionality 
should (eventually) be moved into the target itself. Currently, it is 
implemented internally in the parser inside of the C module. That's the main 
problem with the current implementation that needs fixing. See issue #17902 
(which is not a documentation issue but a real issue, BTW).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

Looking at this more, the Parser/Builder abstraction in ET leaks badly, 
especially when it comes to incremental parsing. This manifests in (at least) 
two ways:

1. The parser accepted by iterparse has to use the ET-provided TreeBuilder, 
because the C implementation of TreeBuilder has undocumented fields and hooks 
used directly by the parser.
2. Even though iterparse  IncrementalParser accept a parser argument, it 
cannot be just any parser. This parser (in addition to using the ET 
TreeBuilder) must implement an undocumented and obscure _setevents method; this 
makes the original intent of the abstraction - hooking up arbitrary parsers to 
the interface, hard to follow.

To *really* fix this, all these capabilities have to be exposed as APIs. 
TreeBuilder has to support an explicit API for collecting and reporting events. 
XMLParser has to call into this API and either not have _setevents at all or 
have something public and documented. Note also that event hookup in the parser 
makes sense in a way, because we don't want events to be fired when not needed 
(for performance). So we don't want any parser to hook up all events for 
firing, and only incremental tree builders to collect them. So things aren't 
quite *so* simple. In addition, since iterparse (an existing, stable API of the 
module) expects a parser argument, it won't be easy creating an 
IncrementalTreeBuilder, because how would we let people using custom parsers 
with iterparse not change any code and yet keep things working?

--

Since the above presents considerable changes in the internals, to get *really* 
right - we do have a problem for the upcoming release because I'm not sure if 
anyone has the time or will to do it. However, the feature is interesting and 
useful, and the code is cleaner than it used to be. So I propose the following 
compromise that can serve us for the 3.4 release:

IncrementalParser stays, but its methods will be renamed to feed  close, which 
is consistent with the current XML parsing APIs (including 
xml.sax.xmlreader.IncrementalParser). In a future release, we may decide to 
create a IncrementalTreeBuilder anyway but then IncrementalParser can be 
layered on top of it and offered as a convenience.

--

To conclude, I don't think that when looked at as exposing an implementation 
detail of iterparse, IncrementalParser is so horrible. iterparse is already its 
own kind of thing somewhat alien to ET, but it's very useful and important. 
IncrementalParser is just replacing iterparse's parse whole source behavior 
with incremental feeds followed by a close. This is similar to the APIs 
xml.sax.xmlreader is exposing, so there is a consistency here. Solving all the 
implementation problems would certainly be great, but unfortunately no one 
seems to have time for this at the moment.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

I attached a patch that removes the IncrementalParser class and merges its 
functionality into the _IterParseIterator. It thus retains most of the 
refactoring without adding new functionality and/or APIs.

I did not take a look if anything else from later changes needs to be reverted, 
neither did I update the documentation. I guess the newly added doc sections 
can just be deleted.

One thing that is still dubious is the requirement for the _setevents() method 
on the parser instance. That was already a flaw before Antoine's change, 
although it only hit if the C parser was used. It now affects both the Python 
parser and the C parser. It's not a regression because code that relies on it 
not being used is broken when it is being used. However, we should be aware 
that we are promoting a quirk to a visible API here. I don't see it as a real 
problem, given that it's an internal API (and thus an implementation detail), 
that's why I left it in. It should eventually be replaced by a proper setup 
based on the parser target.

--
Added file: http://bugs.python.org/file31460/remove_IncrementalParser.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

Thanks for the effort, Stefan, but I still think IncrementalParser is worth 
keeping. It provides functionality that doesn't currently exist in ET, and IMHO 
this functionality is both important and useful.

Renaming its methods to feed  close kills the API naming inconsistency vs. 
XMLParser and SAX's IncrementalParser. As for future directions, having it does 
not add additional constrains to those that already exist because of iterparse 
accepting a custom XMLParser creates compatibility problems already.

For the latter, I think I can also beef up the documentation a bit to be more 
explicit (such as requiring the XMLParser provided to iterparse to derive from 
ET.XMLParser; otherwise, it just won't have the required _setevents).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:


TreeBuilder has to support an explicit API for collecting and reporting events. 
XMLParser has to call into this API and either not have _setevents at all or 
have something public and documented. Note also that event hookup in the parser 
makes sense in a way, because we don't want events to be fired when not needed 
(for performance). So we don't want any parser to hook up all events for 
firing, and only incremental tree builders to collect them.


That's why I wanted to introduce the wrapper class. The idea was to extend the 
supported methods on the target object by those needed for iterparse(), 
specifically start_ns() and end_ns(). That way, the event callback setup would 
be just natural.



In addition, since iterparse (an existing, stable API of the module) expects a 
parser argument, it won't be easy creating an IncrementalTreeBuilder, because 
how would we let people using custom parsers with iterparse not change any code 
and yet keep things working?


Yep, that's a problem in the original design.

One way to fix it would be to allow changing the parser target after the 
creation of the parser, but before it actually started parsing. That could be 
an entirely internal feature that would then allow injecting the wrapper 
between the parser and the target after the fact. 

Given that the current support for using different parsers in iterparse() is 
pretty much non-existant, requiring the applicable parsers to support this 
feature would not be a regression, IMHO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

 I still think IncrementalParser is worth keeping.

If you want to keep it at all cost, I think we should at least hide it behind a 
function (as with iterparse()). If it's implemented as a class, chances are 
that people will start relying on internals by inheriting from it. And we 
already know that those internals should eventually be removed completely. 
Eventually, that function should become a two-liner or so.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

 Stefan Behnel added the comment:

  I still think IncrementalParser is worth keeping.

 If you want to keep it at all cost, I think we should at least hide it
 behind a function (as with iterparse()). If it's implemented as a class,
 chances are that people will start relying on internals by inheriting from
 it. And we already know that those internals should eventually be removed
 completely. Eventually, that function should become a two-liner or so.


I'm not sure I see how this is different from any class exposed by the
stdlib. Users can inherit it, rely on internals and then we can never
change its implementation? Surely this isn't true. We don't make guarantees
w.r.t. implementation and internals, only interfaces. Yes, Python lets you
monkey-patch anything, but this doesn't mean we can't build stable APIs in
Python and change implementation in the future.

So the only thing we should focus on is the *external* API of this class.
With only feed/close/events exposed, it seems logical that these can be
maintained even when the implementation changes.

Besides, such an incremental parser is a natural match for a class because
it has state and several methods that collectively act on that state.
Redesigning it as a function would be awkward. That said, if you have a
specific suggestion in mind that would be as natural to use in user code,
feel free to offer it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

Actually, let me revise my rpevious comment. I think we should fake the new 
interface for now by adding a TreeEventBuilder that requires having its own 
TreeBuilder internally, instead of wrapping an arbitrary target. That way, we 
can avoid having to clean up the existing dependency of the C parser on the 
C-level TreeBuilder implementation.

Once this cleanup gets done, we can add the support for wrapping user defined 
targets as a feature.

So, basically, I propose to take the route that my TreeEventBuilder patch went, 
just with a simplified implementation. I'll see if I can cut down the patch to 
those essentials.

Would you be ok with something that uses the isinstance() hack in the parser?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

Stefan, I don't mind looking at a working patch; however, it's not clear to me 
how you intend to solve the iterparse-accepting-a-custom-XMLParser problem. As 
for faking the new API, I don't know if that's a good idea because we're not 
yet sure what that new API is. I agree with your earlier observation that we 
must begin by refactoring the internals (in the C implementation) and defining 
cleaner APIs for existing objects. Once we do that, we may have better ideas 
about the ideal API for incremental parsing / event building. But that's a 
large job.

Again, fully working patches will be considered. Lacking those, I'm inclined to 
go with the interim solution proposed in 
http://bugs.python.org/issue17741#msg196133

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

 fully working patches will be considered

Let me remind you that it's not me who wants this feature so badly.


 As for faking the new API, I don't know if that's a good idea because we're 
 not yet sure what that new API is.

If that's your concern, then I suggest not adding the feature at all, as long 
as we don't know if we can (or should) keep up the IncrementalParser facade 
with the revised implementation.

For example, can it accept a user defined parser instance as input or not? Can 
it accept a user defined parser target as input? Can it wrap it or would it 
maybe have to inherit from a TreeBuilder? Should it be a class or a helper 
function? I don't see how the interface you proposed leaves less questions open 
than what I am proposing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

  As for faking the new API, I don't know if that's a good idea because
 we're not yet sure what that new API is.

 If that's your concern, then I suggest not adding the feature at all, as
 long as we don't know if we can (or should) keep up the IncrementalParser
 facade with the revised implementation.

 For example, can it accept a user defined parser instance as input or not?
 Can it accept a user defined parser target as input? Can it wrap it or
 would it maybe have to inherit from a TreeBuilder? Should it be a class or
 a helper function? I don't see how the interface you proposed leaves less
 questions open than what I am proposing.


This is an existing API within ET:

  xml.etree.ElementTree.iterparse(source, events=None, parser=None)

The parser argument is limited: it can't use a custom TreeBuilder. It also
must provide _setevents, so either inherit from XMLParser or just provide
that method in another way [I'll try to improve the documentation to make
all of this explicit]. Whatever we say in this issue, iterparse is here to
stay.

  class xml.etree.ElementTree.IncrementalParser(events=None, parser=None)

Is the new class. Its interface combines iterparse and XMLParser - so it is
subject to the same constraints. It also serves as a basis of iterparse's
implementation.

If the internals are changed, the constrains we'll be subject to with
iterparse will be the same for IncrementalParse; no more. For example, the
solution can be substituting the target of the given parser to the event
builder target. Since we force the target to be TreeBuilder now, it should
not really break user code because that part isn't custom. But whatever the
solution is, it will be the same solution for both IncrementalParser and
iterparse.

Also, even if the new approach is implemented in the next release,
IncrementalParser can stay as a simple synonym to
XMLParser(target=EventBuilder(...)).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

 Also, even if the new approach is implemented in the next release,
IncrementalParser can stay as a simple synonym to
XMLParser(target=EventBuilder(...)).

No it can't. According to your signature, it accepts a parser instance as 
input. So it can't just create a new one internally.



This is an existing API within ET:

  xml.etree.ElementTree.iterparse(source, events=None, parser=None)

The parser argument is limited: it can't use a custom TreeBuilder. It also
must provide _setevents, so either inherit from XMLParser or just provide
that method in another way [I'll try to improve the documentation to make
all of this explicit]. Whatever we say in this issue, iterparse is here to
stay.


I agree. However, I can't see why this should constrain us when adding a new 
API. Sure, the current implementation puts certain restrictions on the input. 
But I see that as an advantage, not as a constraint. It's currently broken in 
what it accepts as input, so we are free to change that aspect later on. And we 
can now try to get it right for the new API that we add.

Also, AFAICT, passing a parser into iterparse() is a very rare thing to do, if 
only because it doesn't really work, so the likelyhood of breaking code is very 
limited (I'm not saying we should, just thinking about the impact). Note that 
lxml.etree doesn't currently support this at all.


Just to throw some more ideas into the wild, what do you think about making 
this an inherent feature of the XMLParser class itself? That can be done now, 
without major changes to the underlying code. It could be enabled by a new 
keyword argument collect_events=None that you could pass into the constructor 
in order to define the set of events. The events() method would always be 
available, but would only yield events if there actually are any. Otherwise, it 
would just terminate and be done. So, unless you pass it a set of event names 
at init time, the parser wouldn't collect events and you wouldn't get any out 
of it. Sounds like a clean interface to me.

Collecting the events would work in the same way as it currently works, i.e. it 
would call the target's start() method, and whatever that returns will be added 
as value to the events as a tuple (start, target_return_value). That means 
that it's up to the parser target to determine what the event iteration 
returns. Can be elements, can be None, can be something else.

As for backwards compatibility with iterparse(), I don't see a problem with 
officially documenting iterparse() as requiring an XMLParser if a parser is 
provided, and as reconfiguring that parser to its needs. It already does that 
anyway and it currently doesn't work for anything but an XMLParser.


Also, on a related note, we might (!) consider the addition of the _setevents() 
method a bug in CPython 3.3. It didn't exist in ElementTree before it was 
merged with cElementTree. Instead, the expected interface was (apparently) that 
the parser that is passed in has a _parser attribute that holds an Expat 
parser, and that iterparse() could reconfigure according for itself. Both are 
not public APIs and even differed between ET and cET, so I don't see any harm 
being done, but it would be good to have one way to do it. OTOH, if that way is 
_setevents(), then be it. The advantage would be that this interface is at 
least independent of Expat and could potentially be implemented by other types 
of parsers.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Jeremy Kloth

Changes by Jeremy Kloth jeremy.kloth+python-trac...@gmail.com:


--
nosy: +jkloth

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

I actually have another idea. Since we all agree that passing a custom parser 
to iterparse is dubious, IncrementalParser does not have to do that at all. 
This will make it a much more future-proof API. So its signature can be:

  class xml.etree.ElementTree.IncrementalParser(events=None)

The only remaining question is how will iterparse then work based on 
IncrementalParser (since iterparse does accept a parser). iterparse can just 
set the parser on IncrementalParser after creating it - it's an internal 
contract that does not have to be exposed.

This will be better than the current approach in terms of future-proofing.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Eli Bendersky

Eli Bendersky added the comment:

I had a chat with Nick about this, and a variation of the proposal in 
http://bugs.python.org/msg196133 seems acceptable to us both. To summarize, 
here's what goes into 3.4.

* The class will be named EventParser.
Rationale: The Incremental in IncrementalParser is ambiguous, because it's not 
clear which side is incremental - input or output. In addition, XMLParser is 
already incremental in the sense that it has a feed() method. EventParser is 
(although not a perfect name) clearer w.r.t what it does - produces events from 
the parsing. Another (milder) reason is to avoid confusing duplication with 
xml.sax.xmlreader.IncrementalParser, which is somewhat different. Yet another 
is that if we eventually figure out how to implement this in a parser target, a 
great name for that target would be EventBuilder, which is a logical building 
block for EventParser.

* It will *not* accept a parser argument in the constructor.
Rationale: the parser argument of iterparse is broken anyway. This will make it 
much easier to modify the implementation of EventParser in the future when the 
C internals are fixed w.r.t problems mentioned in this issue.

* iterparse's parser argument will be deprecated, and the documentation will 
be more detailed w.r.t to the limitations on its current parser argument (the 
limitations are there in the code, but they're not fully documented).

* EventParser's input-side methods will be named feed and close, to be more 
consistent with existing XML APIs (both XMLParser and 
xml.sax.xmlreader.IncrementalParser).

* EventParser's output-side method will be called read_events.
Rationale: read_ helps convey that the consumption of events is destructive. 
A possible future addition of a complementary, non-destructive API - 
peek_events will be considered.

* Documentation will be modified to clarify exactly how EventParser differs 
from XMLParser and iterparse.

Also, new issue(s) will be created to address fixing specific implementation 
problems.



Nick, feel free to add/correct if I misunderstood or forgot something.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Nick Coghlan

Nick Coghlan added the comment:

Sounds good to me :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-25 Thread Stefan Behnel

Stefan Behnel added the comment:

Hmm, did you look at my last comment at all? It solves both the technical 
issues and the API issues very nicely and avoids any problems of potential 
future changes. Let me quickly explain why.

The feature in question depends on two existing parts of the API: the event 
generation of the parser, and the return values of the parser target (e.g. a 
tree builder). So there are really only three places where this feature makes 
sense, both technically and API-wise.

1) in the target
2) in the parser
3) between parser and target

Note how a separate class is ruled out right from the start by the fact that 
the feature lives somehwere between parser and target. It's an inherent part of 
the existing design already (and of the implementation, BTW), so I don't see 
how adding a separate thing to control it makes any sense.

1) is impossible because the target is user provided and we do not control it
2) works fine because the parser controls both the call to the target and its 
return value
3) would be nice (and was my original favourite) but is hard to do with the 
current implementation and requires further changes to the API of parser targets

So 2) is the choice that remains.


 It will *not* accept a parser argument in the constructor.

Yes, I had also arrived at that point yesterday. However, I then reconsidered 
why there has to be a separate class in the first place. Basically, if you 
can't allow users to control the parser because the feature is already part of 
the parser, then why expose it as something that appears to be independent? 
Adding it to the existing API is very clean and exposes the feature where users 
would look for it.


 EventParser's output-side method will be called read_events.

Makes sense to me. I had also considered that problem but didn't come up with a 
good name yet. read_events reads better than plain events.


 The class will be named EventParser.

Obviously because it's parsing Events, as opposed to the XMLParser, which 
parses XML, or the HTMLParser, which parses HTML, right?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-24 Thread Nick Coghlan

Nick Coghlan added the comment:

Using tulip-inspired method names (when tulip hasn't landed) to duplicate 
existing data input functionality (feed() and close()) seems a rather dubious 
design decision to me.

Given how popular lxml.etree is as an alternative to the standard library's 
etree implementation, we shouldn't dismiss Stefan's design concerns lightly.

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-24 Thread Nick Coghlan

Nick Coghlan added the comment:

Reopening this as per discussion on python-dev. I haven't reverted anything at 
this point, as subsequent changes mean a simple hg backout is no longer 
sufficient.

--
resolution: fixed - 
stage: committed/rejected - needs patch
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-24 Thread Eli Bendersky

Eli Bendersky added the comment:

I like the idea of having .events() in a special target passed to a XMLParser, 
which already has feed() and close(). I read through Stefan's proposal and 
there are a couple of issues I wanted to raise:

1. Why have the event builder wrap a tree builder? Can't it just be a 
separate target?
2. Instead of the stock XMLParser recognizing the event builder via isinstance 
and giving it special treatment, would it not be better to still have a 
separate class that implements the XMLParser interface (it can derive from it 
or just use duck typing)?

Note that (2) brings us closer to Antoine's original design, albeit with 
different method names and somewhat different underlying implementation.

Also, Stefan, could you clearly explain the specific implementation issue you 
ran into with your partial patch. You mentioned an unrelated bug that 
can/should be solved before such an implementation can work. Can you provide 
more details?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Stefan Behnel

Stefan Behnel added the comment:

Ok, so what are we going to do for the next alpha?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Roundup Robot

Roundup Robot added the comment:

New changeset 815f328a6578 by Antoine Pitrou in branch 'default':
Issue #17741: use composition, rather than inheritance, for 
xml.etree.iterparse's result class.
http://hg.python.org/cpython/rev/815f328a6578

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

I've committed the inheritance removal patch. I personally don't think there's 
anything more to do, since the TreeEventBuilder proposal stumbled on some 
implementation issues.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Stefan Behnel

Stefan Behnel added the comment:

I was asking for the current implementation to be removed until we have a 
working implementation that hurts neither the API nor the module design.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 I was asking for the current implementation to be removed until we
 have a working implementation that hurts neither the API nor the
 module design.

It would help if we could keep the discussion on rational terms.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Stefan Behnel

Stefan Behnel added the comment:

I don't think I understand what you mean.

In any case, it's not to late to remove the implementation. There was only one 
alpha release so far that included it, so it can't really break any existing 
code that relies on it. The longer we wait, the more damage will be done. 
That's why I am asking for removal now.

I would prefer not to rush in a replacement. The current state shows where that 
brings us. So, I request the removal of the current code, then we can calmly 
come up with a good implementation of the feature in question. If that can't be 
done for 3.4, well, then we have at least avoided adding something that will 
need to be deprecated and replaced later on. If it can be done for 3.4, the 
better.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 In any case, it's not to late to remove the implementation.

As far as I can say, there's no reason to remove the implementation
except for your distaste of the method names.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

To wrap things up:
- the feature is desireable
- the API is reasonable
- the implementation is functional and properly tested

Really, if you can't come up with an actual blocker, I'll ask you to leave this 
issue at rest.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Stefan Behnel

Stefan Behnel added the comment:

Could we please keep the discussion on rational terms?

It's not just the method names. The problem is that you are duplicating an 
existing class (the XMLParser) for no good reason, instead of putting the 
feature where it belongs: *behind* the XMLParser. Even worse, you are 
duplicating the class and giving it a *different* API.

This goes completely against the design of the existing API and counters the 
reusability of the existing code.

I'm not questioning that the feature is desirable. I'm saying that the design 
is wrong.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
nosy:  -pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 The problem is that you are duplicating an existing class (the
 XMLParser) for no good reason

No. The logic was already in _IterParseIterator, just wrapped up
differently.

And the proof is that the original commit, while adding a feature,
*removed* code from xml.etree:

$ hg di -c f903cf864191 --stat Lib/xml/etree/
 Lib/xml/etree/ElementTree.py |  209
+-
 1 files changed, 97 insertions(+), 112 deletions(-)

 This goes completely against the design of the existing API and
 counters the reusability of the existing code.

No it doesn't. It irritates you, for some reason, but your feelings are
not a valid reason to revert a useful feature. And this issue is not for
you to repeat the same stubborn line of argument again and again. I'm
extremely bored, and I won't care about your comments anymore.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-23 Thread Stefan Behnel

Stefan Behnel added the comment:

Given that it seems to be hard to come to a consensus in this ticket, I've 
asked for removal of the code on python-dev.

http://mail.python.org/pipermail/python-dev/2013-August/128095.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-10 Thread Stefan Behnel

Stefan Behnel added the comment:

I gave the implementation a try and attached an incomplete patch. Some tests 
are failing.

It turns out that it's not entirely easy to do this. As Antoine noticed, the 
hack in the C implementation of the TreeBuilder makes it tricky to integrate 
with other targets, specifically when intercepting events that the tree builder 
needs . That's a more general bug and it currently gets in the way.

Anyway, here's how it should eventually look like.

--
Added file: http://bugs.python.org/file31211/iterparse_TreeEventBuilder.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Here is a patch that cleans up the current implementation to avoid making the 
result of iterparse() an IncrementalParser (with all of its new API).

Please note that the tulip mailing list is not an appropriate place to discuss 
additions to the XML libraries, and ElementTree in particular.

--
nosy: +scoder
Added file: http://bugs.python.org/file31205/iterparse_cleanup.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Changes by Stefan Behnel sco...@users.sourceforge.net:


--
components: +XML

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Copying the discussion between Antoine and me from python-dev:

 IMO it should mimic the interface of the TreeBuilder, which
 calls the data reception method feed() and the termination method
 close(). There is no reason to add yet another set of methods names
 just to do what others do already.

 Well, the difference here is that after calling eof_received() you can
 still (and should) call events() once to get the last events. I think
 it would be weird if you could still do something useful with the object
 after calling close().

 Also, the method names are not invented, they mimick the PEP 3156
 stream protocols:
 http://www.python.org/dev/peps/pep-3156/#stream-protocols

I see your point about close(). I assume your reasoning was to make the 
IncrementalParser an arbitrary stream end-point. However, it doesn't really 
make all that much sense to connect an arbitrary data source to it, as the 
source wouldn't know that, in addition to passing in data, it would also have 
to ask for events from time to time. I mean, you could do it, but then it would 
just fill up the memory with parser events and loose the actual advantages of 
incremental parsing. So, in a way, the whole point of the class is to *not* be 
an arbitrary stream end-point.

Anyway, given that there isn't really the One Obvious Way to do it, maybe you 
should just add a docstring to the class (ahem), reference the stream protocol 
as the base for its API, and then rename it to IncrementalStreamParser. That 
would at least make it clear why it doesn't really fit with the rest of the 
module API (which was designed some decade before PEP 3156) and instead uses 
its own naming scheme.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Actually, let me take that last paragraph back. There is an Obvious Way to do 
it, and that's the feed() and close() methods. They are the existing and 
established ElementTree interface for incremental parsing. The fact that 
close() doesn't clean up all state is IMHO a minor issue. The state will be 
cleaned up automatically once the iteration terminates, and that's the normal 
behaviour of iterators. So it actually fits both protocols quite nicely.

I'd just leave the stream protocol out completely here. For one, the 
implementation isn't complete anyway (the connection_*() methods don't make 
much sense), and as I said, it's not very useful to consider the parser a 
general end-point to that protocol.

I'd also suggest returning the iterator over the remaining events from close(), 
just like the TreeBuilder returns the tree. That covers the (less common) use 
case of first parsing everything and then processing the events. I'll add 
another patch that does that.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Changes by Stefan Behnel sco...@users.sourceforge.net:


Added file: http://bugs.python.org/file31206/iterparse_api_cleanup.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Thinking about the original patch some more - I wonder why it doesn't use a 
wrapper for TreeBuilder to process the events. Antoine, is there a reason why 
you had to add this _setevents() method to the XMLParser, instead of making the 
IncrementalParser an IncrementalTreeBuilder?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

Oh, and could we reopen this ticket, please?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 Antoine, is there a reason why you had to add this _setevents() method
 to the XMLParser, instead of making the IncrementalParser an
 IncrementalTreeBuilder?

The point is not to build a tree of potentially unbounded size (think XMPP). 
The point is to yield events in a non-blocking way (iterparse() is blocking, 
which makes it useless for non-blocking applications).

An IncrementalTreeBuilder wouldn't have much point IMO. It is not more costly 
to accumulate a string and parse it at the end, than to progressively build a 
growing XML tree.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

About the patch: I think changing the API names now that alpha1 has been 
released is a bit gratuitous. I would like to keep the encapsulation part but 
ditch the naming changes.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 The point is not to build a tree of potentially unbounded size (think
 XMPP). The point is to yield events in a non-blocking way (iterparse()
 is blocking, which makes it useless for non-blocking applications).

Ok, but that's the only difference. Instead of getting the events from the 
parser, you could equally well get them from the TreeBuilder, also in a 
non-blocking way.

Sticking this functionality into a parser target object has the advantage that 
the parser interface wouldn't have to change. So, instead of introducing an 
entirely new parser interface, we'd just add a class that can be used as a 
parser target and provides an additional events() method. That's a 
substantially less invasive API change.


 An IncrementalTreeBuilder wouldn't have much point IMO. It is not more
 costly to accumulate a string and parse it at the end, than to
 progressively build a growing XML tree.

I don't think I understand what you mean.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 About the patch: I think changing the API names now that alpha1 has been
 released is a bit gratuitous.

Sorry for being late, but I can't see it being my fault.

A change in an alpha release is still way better than a change after a final 
release. Eventually, lxml will have to go on par here, so it's better to get it 
right now, before any harm is done.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  The point is not to build a tree of potentially unbounded size
  (think
  XMPP). The point is to yield events in a non-blocking way
  (iterparse()
  is blocking, which makes it useless for non-blocking applications).
 
 Ok, but that's the only difference. Instead of getting the events
 from the parser, you could equally well get them from the
 TreeBuilder, also in a non-blocking way.

But your TreeBuilder is also growing a tree internally, and therefore
eating more and more memory, right?

I don't see the point of stuffing different kinds of functionality
inside a single class. It makes the intended use less obvious, and
errors more likely. Right now, IncrementalParser has a simple API,
and there's a single way to use it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 A change in an alpha release is still way better than a change after
 a final release.

But worse than no change at all. Arguing about API naming is a bit futile,
*especially* when the ship has sailed.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 But worse than no change at all. Arguing about API naming is a bit
 futile, *especially* when the ship has sailed.

rantIt's easy to say that as a core developer with commit rights who can 
simply hide changes in a low frequented bug tracker without notifying those who 
have to know about these changes./rant It's pure luck I noticed this change 
during the alpha release cycle.

I'm also not arguing about naming. I'm questioning the design, trying to get it 
into a shape that fits the existing APIs. Why do we need two incremental 
parsing interfaces in one and the same library that use completely different 
method names, although doing otherwise exactly the same? Why should the type of 
the *result* of the parsing process change the method name that you need to 
call to *insert* data into the parser?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

But IncrementalParser uses an XMLParser internally, which in turn uses a 
TreeBuilder internally. So what exactly is your point?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 But IncrementalParser uses an XMLParser internally, which in turn
 uses a TreeBuilder internally. So what exactly is your point?

Well, I would rather like to understand yours. Whatever
IncrementalParser uses internally needn't impact what API it
exposes to the user.

(as a matter of fact, iterparse() doesn't expose a
TreeBuilder-like API, either)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 Well, I would rather like to understand yours.

My point is that the IncrementalParser uses a TreeBuilder that builds an XML 
tree in the back. So I'm wondering why you are saying that it doesn't build a 
tree.


 Whatever IncrementalParser uses internally needn't impact what API it
 exposes to the user.

And in fact, we don't even need an IncrementalParser, because XMLParser already 
*has* an incremental parsing interface. All that's missing is the bit that 
collects and exposes the events. And my point is that we shouldn't duplicate 
the existing *data entry* interface for that, especially not under different 
names for identical functionality, but only add an interface to access those 
events as *output*, i.e. to add the bit of the API that's actually missing.

As an analogon, what would you say if I asked for adding a new, separate 
Mapping interface to Python that uses the method name set_value() instead of 
__setitem__() because I want it to read data from the hard drive and not from 
memory?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

 rantIt's easy to say that as a core developer with commit rights
 who can simply hide changes in a low frequented bug tracker without
 notifying those who have to know about these changes./rant It's
 pure luck I noticed this change during the alpha release cycle.

It is the rule for most stdlib improvements that they go directly
through the bug tracker. Most core developers and outsiders
would feel swamped by the traffic if all feature additions went
through the mailing-list.

I'm honestly baffled that you think I am trying to hide things.
Why do you think I would feel guilty about proposing an addition,
or try to sneak things inside xml.etree?

(yes, we could theoretically run polls for every addition
we propose, collect and discuss the results, and iterate several
times until the outcome is successful; I don't think any of us
has the bandwidth to do that, which is why that practice is
only used for game-making changes (i.e. PEP material))

 I'm also not arguing about naming. I'm questioning the design, trying
 to get it into a shape that fits the existing APIs. Why do we need
 two incremental parsing interfaces in one and the same library that
 use completely different method names, although doing otherwise
 exactly the same?

Well, unless I'm missing something, TreeBuilder doesn't do parsing,
it takes already parsed data: it has a start() method to open a tag,
and a data() method to add raw text inside that tag. IncrementalParser,
OTOH, has a data_received() method to which you pass a piece of
non-parsed XML string.

The other incremental parsing API is actually iterparse(). The
reason I proposed IncrementalParser is that iterparse() is useless
for non-blocking applications. IncrementalParser produces the same
kind of output as iterparse(), but control of when to feed it data
is transferred to the user of the API.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 TreeBuilder doesn't do parsing, it takes already parsed data: it has a
 start() method to open a tag, and a data() method to add raw text
 inside that tag.

That is correct. However, the XMLParser has a feed() method that sends new data 
into the parser, and a close() method that tells the parser that it's done. So 
there already is an incremental parsing interface, and your change is 
duplicating that interface under a different name. Specifically, 
IncrementalParser has exactly the same interface as XMLParser when it comes to 
entering data, but uses different method names for it. This is a Bad Design 
because it introduces an unnecessary inconsistency in the API.

However, what you are trying to change is not the way data is *entered* into 
the parser. What you are after is to change the way the *parsed* data is 
*presented* to the user. That is not the responsibility of the parser, it's the 
responsibility of the TreeBuilder.

In your code, the TreeBuilder builds a tree, and the new interface 
*additionally* collects parse events in a list. So, the right way to do it 
would be to change the parser *target* to do both, i.e. to build a tree and 
collect events at the same time.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Antoine Pitrou

Antoine Pitrou added the comment:

  Well, I would rather like to understand yours.
 
 My point is that the IncrementalParser uses a TreeBuilder that builds
 an XML tree in the back. So I'm wondering why you are saying that it
 doesn't build a tree.

Unless I'm reading it wrong, when _setevents() is called, the internal
hooks are rewired to populate the events list, rather than call the
corresponding TreeBuilder methods. So, yes, there's a TreeBuilder
somewhere, but it stands unused.

(the _setevents() method already existed on the C impl, by the way.
I added it to the Python impl to make things more regular and avoid
two separate iterparse() implementations)

 And my
 point is that we shouldn't duplicate the existing *data entry*
 interface for that, especially not under different names for
 identical functionality, but only add an interface to access those
 events as *output*, i.e. to add the bit of the API that's actually
 missing.

What difference would that make? In the end, you mustn't mix
event-driven/non-blocking and cumulative/blocking styles of
programming, so having two separate APIs doesn't strike me as
a problem (it may be a good thing actually, for clarity reasons).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 Unless I'm reading it wrong, when _setevents() is called, the internal
 hooks are rewired to populate the events list, rather than call the
 corresponding TreeBuilder methods. So, yes, there's a TreeBuilder
 somewhere, but it stands unused.

Yes, you *are* reading it wrong. For example, the start callback calls 
self._start_list, which in turn calls self.target.start(), thus calling into 
the TreeBuilder. That's the thing that constructs the elements that the 
IncrementalParser collects in its events list.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Eli Bendersky

Eli Bendersky added the comment:

 rantIt's easy to say that as a core developer with commit rights who  can 
 simply hide changes in a low frequented bug tracker without
 notifying those who have to know about these changes./rant It's pure  luck 
 I noticed this change during the alpha release cycle.

What *are* you talking about? Who is trying to hide anything? Don't you see the 
history of the bug? Antoine opened it very... openly... and we discussed it. 
Also, committing to pre-alpha trunk is not a sailed ship - if there are 
problems, things can be easily fixed.

Please stop complaining and learn to use the tools properly. Here's one tip: 
http://mail.python.org/mailman/listinfo/python-bugs-list - subscribe to it and 
use your mail client's filtering to see new bugs on topics that interest you. 
It isn't very hard.

Another tip: http://mail.python.org/mailman/listinfo/python-checkins - 
subscribe to it and use your mail client's filtering to see new checkins on 
topics that interest you. It isn't hard either.

Tip #3: ask to be added to the experts list for ET. No one would object to 
that, and it would increase the likelihood that people add you to the nosy list 
directly for ET-related issues (particularly added features).

Now back to a productive discussion please...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue17741] event-driven XML parser

2013-08-09 Thread Stefan Behnel

Stefan Behnel added the comment:

 ask to be added to the experts list for ET

Already done, see the corresponding python-dev thread.


 Now back to a productive discussion please...

I think we already are. Keep reading through the rest of the posts.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue17741
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   >