Neil D. Cerutti, 07.05.2014 20:04: > On 5/7/2014 1:39 PM, Alan Gauld wrote: >> On 07/05/14 17:56, Stefan Behnel wrote: >>> Alan Gauld, 07.05.2014 18:11: >>>> and ElementTree (aka etree). The documenation gives examples of both. >>>> sax is easiest and fastest for simple XML in big files ... >>> >>> I wouldn't say that SAX qualifies as "easiest". Sure, if the task is >>> something like "count number of abc tags" or "find tag xyz and get an >>> attribute value from it", then SAX is relatively easy and also quite >>> fast. >> >> That's pretty much what I said. simple task, big file. sax is easy. >> >> For anything else use etree. >> >>> BTW, ElementTree also has a SAX-like parsing mode, but comes with a >>> simpler interface and saner parser configuration defaults. >> >> My experience was different. Etree is powerful but for simple >> tasks I just found sax easier to grok. (And most of my XML parsing >> is limited to simple extraction of a field or two.) > > If I understand this task correctly it seems like a good application for > SAX. As a state machine it could have a mere two states, assuming we aren't > troubled about the parent nodes of Country tags.
Yep, that's the kind of thing I meant. You get started, just trying to get out one little field out of the file, then notice that you need another one, and eventually end up writing a page full of code where a couple of lines would have done the job. Even just safely and correctly getting the text content of an element is surprisingly non-trivial in SAX. It's still unclear what the OP wanted exactly, though. To me, it read more like the task was to copy some content over from one XML file to another, in which case doing it in ET is just trivial thanks to the tree API, but SAX requires you to reconstruct the XML brick by brick here. > In my own personal case, I partly prefer xml.sax simply because it ignores > namespaces, a nice benefit in my cases. I wish I could make ElementTree do > that. The downside of namespace unaware parsing is that you never know what you get. It works for some input, but it may also just fail arbitrarily, for equally valid input. One cool thing about ET is that it makes namespace aware processing easy by using fully qualified tag names (one string says it all). Most other XML tools (including SAX) require some annoying prefix mapping setup that you have to carry around in order to tell the processor that you are really talking about the thing that it's showing to you. Stefan _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor