Re: Parsing large XML files

2013-12-19 Thread Peter Ullah
Thank you everyone for your advice, I found it useful and think that I am part-way to a solution using clojure.data.xml/source-seq as suggested by dannue. I'll post what I have done so far in the hope it might help someone else... comments on style welcome. *Solution*: Given the following XML

Re: Parsing large XML files

2013-12-17 Thread danneu
Good question. Every lib that came to mind when I saw clojure.data.xml/parse's tree of Elements {:tag _, :attrs _, :content _} only works on zippers which apparently sit in memory. One option is to use `clojure.data.xml/source-seq` to get back a lazy sequence of Events {:type _, :name _, :attrs

Re: Parsing large XML files

2013-12-17 Thread Matching Socks
On general Java principles, you can "stream" a large XML file with either SAX or StAX and pluck what you like from it without wasting memory on the rest. If the file is a long series of small sections that could be examined separately, you might use SAX to partition the file and then subject e

Re: Parsing large XML files

2013-12-17 Thread Ryan Senior
As far as I know, using zippers like that will need the whole XML data structure to be in memory. data.xml returns fast because it's lazy (uses pull parsing). Until you start traversing down the structure, it won't parse more of it. data.xml should also be fully streaming, so it shouldn't requir

Parsing large XML files

2013-12-17 Thread Peter Ullah
Hi all, I'm attempting to parse a large (500MB) XML, specifically I am trying to extract various parts using XPath. I've been using the examples presented here: http://clojure-doc.org/articles/tutorials/parsing_xml_with_zippers.html and all was going when tested against small files, however no