Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
Just guessing, but is it something to do with this (from the docstring of parse-seq)? it will be run in a separate thread and be allowed to get  ahead by queue-size items, which defaults to maxint. As I've figured it out, when there's XPP on the classpath, and I'm using it, the code that

Re: Problems with lazy-xml

2011-02-12 Thread Chouser
On Sat, Feb 12, 2011 at 4:16 AM, Marko Topolnik marko.topol...@gmail.com wrote: Just guessing, but is it something to do with this (from the docstring of parse-seq)? it will be run in a separate thread and be allowed to get  ahead by queue-size items, which defaults to maxint. As I've

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
How about replacing (drop-last sibs) with (remove vector? sibs) ? remove will not access the next seq member in advance and the only vector in sibs is the last element. I tried this change and it works for the test code from the original post. On Feb 12, 4:43 pm, Chouser chou...@gmail.com

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
Also, the xpp-based parser is almost an order of magnitude slower than the sax-based one. The only thing it lacks is a couple of type hints: (defn- attrs [^XmlPullParser xpp] (defn- ns-decs [^XmlPullParser xpp] (let [step (fn [^XmlPullParser xpp] These hints increase the performance from

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
On Feb 12, 7:55 pm, Marko Topolnik marko.topol...@gmail.com wrote: How about replacing   (drop-last sibs) with   (remove vector? sibs) ? This was slightly naive. We also need these changes: In siblings: :end-element [[(rest s)]] In mktree: (cons (struct element (:name elem)

Re: Problems with lazy-xml

2011-02-12 Thread Marko Topolnik
In fact, it is enough to replace (drop-last sibs) with (remove seq? sibs). . On Feb 12, 9:54 pm, Marko Topolnik marko.topol...@gmail.com wrote: On Feb 12, 7:55 pm, Marko Topolnik marko.topol...@gmail.com wrote: How about replacing   (drop-last sibs) with   (remove vector? sibs) ?

Re: Problems with lazy-xml

2011-02-11 Thread Marko Topolnik
Right now I'm working with a 300k-record file, but the code must scale into the millions, and, as I mentioned, it is already spewing OutOfMemoy errors. Also, on a more abstract level, it's just not right to thrash the memory of a concurrent server-side component for absolutely no good reason. --

Re: Problems with lazy-xml

2011-02-11 Thread Benny Tsai
Can you post a link to a (sanitized, if need be) sample file? On Feb 11, 1:21 am, Marko Topolnik marko.topol...@gmail.com wrote: Right now I'm working with a 300k-record file, but the code must scale into the millions, and, as I mentioned, it is already spewing OutOfMemoy errors. Also, on a

Re: Problems with lazy-xml

2011-02-11 Thread Marko Topolnik
http://db.tt/iqTo1Q4 This is a sample XML file with 1000 records -- enough to notice a significant delay when evaluating the code from the original post. Chouser, could you spare a second here? I've been looking and looking at mktree and siblings for two days now and can't for the life of me

Re: Problems with lazy-xml

2011-02-11 Thread Benny Tsai
I can confirm that the same thing is happening on my end as well. The XML is parsed lazily: user= (time (let [root (parse-trim (reader huge.xml))] (- root :content type))) Elapsed time: 45.57367 msecs clojure.lang.LazySeq ...but as soon as I try to do anything with the struct map for the

Re: Problems with lazy-xml

2011-02-11 Thread Chris Perkins
On Feb 11, 5:07 am, Marko Topolnik marko.topol...@gmail.com wrote: http://db.tt/iqTo1Q4 This is a sample XML file with 1000 records -- enough to notice a significant delay when evaluating the code from the original post. Chouser, could you spare a second here? I've been looking and looking

Re: Problems with lazy-xml

2011-02-11 Thread Chouser
On Fri, Feb 11, 2011 at 2:35 PM, Chris Perkins chrisperkin...@gmail.com wrote: On Feb 11, 5:07 am, Marko Topolnik marko.topol...@gmail.com wrote: http://db.tt/iqTo1Q4 This is a sample XML file with 1000 records -- enough to notice a significant delay when evaluating the code from the original

Problems with lazy-xml

2011-02-10 Thread Marko Topolnik
I am required to process a huge XML file with 300,000 records. The structure is like this: root header /header body record.../record record.../record ... 299,998 more /body /root Obviously, it is of key importance not to allocate memory for all the records at once.

Re: Problems with lazy-xml

2011-02-10 Thread Mike Meyer
On Thu, 10 Feb 2011 07:22:55 -0800 (PST) Marko Topolnik marko.topol...@gmail.com wrote: I am required to process a huge XML file with 300,000 records. The structure is like this: root header /header body record.../record record.../record ... 299,998 more