Hi Frederic
You definitely want to be using xmlParse() (or equivalently
xmlTreeParse( , useInternalNodes = TRUE)).
This then allows use of getNodeSet()
I would suggest you use Rprof() to find out where the bottlenecks arise,
e.g. in the XML functions or in S4 code, or in your code
If this is an option for you: An xml database can handle (very) huge xml
files and let you query nodes very efficiently.
Then, you could query the xml databse from R (using REST) to do your
statistics.
There are some open source xquery/xml databases available.
2012/8/11 Frederic Fournier
Hello everyone,
I would like to parse very large xml files from MS/MS experiments and
create R objects from their content. (By very large, I mean going up to
5-10Gb, although I am using a 'small' 40M file to test my code.)
My first attempt at parsing the 40M file, using the XML package, took
On 08/10/2012 03:46 PM, Frederic Fournier wrote:
Hello everyone,
I would like to parse very large xml files from MS/MS experiments and
create R objects from their content. (By very large, I mean going up to
5-10Gb, although I am using a 'small' 40M file to test my code.)
I'm not 100% sure of
4 matches
Mail list logo