[R] Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?

Frederic Fournier Fri, 26 Oct 2012 10:02:02 -0700

Hello again,

I have another question related to parsing a very large xml file with SAX:
what kind of data structure should I favor? Unlike using DOM function that
can return lists of relevant nodes and let me use various versions of
'apply', the SAX parsing returns me one thing at a time.


I first tried to simply append to simple solution of appending to lists as
I get the data. But I very soon realized that this is way too slow.
Then I tried pre-declaring large data.frames of NA and populating them with
[[<-.data.frame. But this is quite slow too.
I then tried pre-declaring large matrix of NA and populating them with [<-.
This is better... but still unmanageable as xml files become large.
I also tried using an environment as a hash structure:

, but realized that this is simple on the programmer, but stalls the
parsing.
I then tried to

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?

Reply via email to