On Wed, May 7, 2014 at 1:26 PM, jitendra gupta <jitu.ic...@gmail.com> wrote:
> I cant use etree/SAX because there we cant get complete line , of course we > can get it by tag name but we are not sure about tag also. Only we know > what ever child of <country> we need to put in new file with country name. Why can't you use such an approach here? You're dealing with structured data: there's no concept of "line" in XML, so I don't know what you mean. You can keep an intermediate state of the events you've seen. At some point, after you encounter the end element of an particular country, you'll have seen the information you need to determine which file the country should go to. The pseudocode would be something like: ################################################################ read events up to beginning of data buffer = [] while there are still events: collect events up to country end into the "buffer" decide what file it goes to, and replay the "buffer" into the appropriate file clear "buffer" ################################################################ If you don't want to deal with a event-driven approach that SAX emphasizes, you may still be able to do this problem with an XML-Pull parser. You mention that your input is hundreds of megabytes long, in which case you probably really do need to be careful about memory consumption. See: https://wiki.python.org/moin/PullDom for an example that filters subtrees. You should be able to quickly adapt that example to redirect elements based on whatever criteria you decide. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor