Just guessing, but is it something to do with this (from the docstring
of parse-seq)?
it will be run in a separate thread and be allowed to get
ahead by queue-size items, which defaults to maxint.
As I've figured it out, when there's XPP on the classpath, and I'm
using it, the code that
On Sat, Feb 12, 2011 at 4:16 AM, Marko Topolnik
marko.topol...@gmail.com wrote:
Just guessing, but is it something to do with this (from the docstring
of parse-seq)?
it will be run in a separate thread and be allowed to get
ahead by queue-size items, which defaults to maxint.
As I've
How about replacing
(drop-last sibs)
with
(remove vector? sibs)
?
remove will not access the next seq member in advance and the only
vector in sibs is the last element. I tried this change and it works
for the test code from the original post.
On Feb 12, 4:43 pm, Chouser chou...@gmail.com
Also, the xpp-based parser is almost an order of magnitude slower than
the sax-based one. The only thing it lacks is a couple of type hints:
(defn- attrs [^XmlPullParser xpp]
(defn- ns-decs [^XmlPullParser xpp]
(let [step (fn [^XmlPullParser xpp]
These hints increase the performance from
On Feb 12, 7:55 pm, Marko Topolnik marko.topol...@gmail.com wrote:
How about replacing
(drop-last sibs)
with
(remove vector? sibs)
?
This was slightly naive. We also need these changes:
In siblings:
:end-element [[(rest s)]]
In mktree:
(cons
(struct element (:name elem)
In fact, it is enough to replace (drop-last sibs) with (remove seq?
sibs).
.
On Feb 12, 9:54 pm, Marko Topolnik marko.topol...@gmail.com wrote:
On Feb 12, 7:55 pm, Marko Topolnik marko.topol...@gmail.com wrote:
How about replacing
(drop-last sibs)
with
(remove vector? sibs)
?
Right now I'm working with a 300k-record file, but the code must scale
into the millions, and, as I mentioned, it is already spewing
OutOfMemoy errors. Also, on a more abstract level, it's just not right
to thrash the memory of a concurrent server-side component for
absolutely no good reason.
--
Can you post a link to a (sanitized, if need be) sample file?
On Feb 11, 1:21 am, Marko Topolnik marko.topol...@gmail.com wrote:
Right now I'm working with a 300k-record file, but the code must scale
into the millions, and, as I mentioned, it is already spewing
OutOfMemoy errors. Also, on a
http://db.tt/iqTo1Q4
This is a sample XML file with 1000 records -- enough to notice a
significant delay when evaluating the code from the original post.
Chouser, could you spare a second here? I've been looking and looking
at mktree and siblings for two days now and can't for the life of me
I can confirm that the same thing is happening on my end as well. The
XML is parsed lazily:
user= (time (let [root (parse-trim (reader huge.xml))] (-
root :content type)))
Elapsed time: 45.57367 msecs
clojure.lang.LazySeq
...but as soon as I try to do anything with the struct map for the
On Feb 11, 5:07 am, Marko Topolnik marko.topol...@gmail.com wrote:
http://db.tt/iqTo1Q4
This is a sample XML file with 1000 records -- enough to notice a
significant delay when evaluating the code from the original post.
Chouser, could you spare a second here? I've been looking and looking
On Fri, Feb 11, 2011 at 2:35 PM, Chris Perkins chrisperkin...@gmail.com wrote:
On Feb 11, 5:07 am, Marko Topolnik marko.topol...@gmail.com wrote:
http://db.tt/iqTo1Q4
This is a sample XML file with 1000 records -- enough to notice a
significant delay when evaluating the code from the original
I am required to process a huge XML file with 300,000 records. The
structure is like this:
root
header
/header
body
record.../record
record.../record
... 299,998 more
/body
/root
Obviously, it is of key importance not to allocate memory for all the
records at once.
On Thu, 10 Feb 2011 07:22:55 -0800 (PST)
Marko Topolnik marko.topol...@gmail.com wrote:
I am required to process a huge XML file with 300,000 records. The
structure is like this:
root
header
/header
body
record.../record
record.../record
... 299,998 more
14 matches
Mail list logo