@All thanks, I cant use etree/SAX because there we cant get complete line , of course we can get it by tag name but we are not sure about tag also. Only we know what ever child of <country> we need to put in new file with country name.
Note: File size is around 800MB, for other requirement(Like converting xml to csv) i used lxml/others. but in my current scenario i dont know what child tag will be there . ###### INPUT XML ####### <?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> ....... ....... </country> <country name="Panama"> <rank updated="yes">69</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data> ######## outputxml (Liechtenstein.xml) ###### <?xml version="1.0"?> <data> <country name="Liechtenstein"> <rank updated="yes">2</rank> <year>2008</year> <gdppc>141100</gdppc> <neighbor name="Austria" direction="E"/> <neighbor name="Switzerland" direction="W"/> ....... ....... </country> </data> ##### ##### <?xml version="1.0"?> <data> <country name="Panama"> <rank updated="yes">69</rank> <year>2011</year> <gdppc>13600</gdppc> <neighbor name="Costa Rica" direction="W"/> <neighbor name="Colombia" direction="E"/> </country> </data> On Thu, May 8, 2014 at 1:19 AM, Stefan Behnel <stefan...@behnel.de> wrote: > Neil D. Cerutti, 07.05.2014 20:04: > > On 5/7/2014 1:39 PM, Alan Gauld wrote: > >> On 07/05/14 17:56, Stefan Behnel wrote: > >>> Alan Gauld, 07.05.2014 18:11: > >>>> and ElementTree (aka etree). The documenation gives examples of both. > >>>> sax is easiest and fastest for simple XML in big files ... > >>> > >>> I wouldn't say that SAX qualifies as "easiest". Sure, if the task is > >>> something like "count number of abc tags" or "find tag xyz and get an > >>> attribute value from it", then SAX is relatively easy and also quite > >>> fast. > >> > >> That's pretty much what I said. simple task, big file. sax is easy. > >> > >> For anything else use etree. > >> > >>> BTW, ElementTree also has a SAX-like parsing mode, but comes with a > >>> simpler interface and saner parser configuration defaults. > >> > >> My experience was different. Etree is powerful but for simple > >> tasks I just found sax easier to grok. (And most of my XML parsing > >> is limited to simple extraction of a field or two.) > > > > If I understand this task correctly it seems like a good application for > > SAX. As a state machine it could have a mere two states, assuming we > aren't > > troubled about the parent nodes of Country tags. > > Yep, that's the kind of thing I meant. You get started, just trying to get > out one little field out of the file, then notice that you need another > one, and eventually end up writing a page full of code where a couple of > lines would have done the job. Even just safely and correctly getting the > text content of an element is surprisingly non-trivial in SAX. > > It's still unclear what the OP wanted exactly, though. To me, it read more > like the task was to copy some content over from one XML file to another, > in which case doing it in ET is just trivial thanks to the tree API, but > SAX requires you to reconstruct the XML brick by brick here. > > > > In my own personal case, I partly prefer xml.sax simply because it > ignores > > namespaces, a nice benefit in my cases. I wish I could make ElementTree > do > > that. > > The downside of namespace unaware parsing is that you never know what you > get. It works for some input, but it may also just fail arbitrarily, for > equally valid input. > > One cool thing about ET is that it makes namespace aware processing easy by > using fully qualified tag names (one string says it all). Most other XML > tools (including SAX) require some annoying prefix mapping setup that you > have to carry around in order to tell the processor that you are really > talking about the thing that it's showing to you. > > Stefan > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor >
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor