Re: [R] Parsing large XML documents in R - how to optimize the speed?

2012-08-11 Thread Duncan Temple Lang
Hi Frederic You definitely want to be using xmlParse() (or equivalently xmlTreeParse( , useInternalNodes = TRUE)). This then allows use of getNodeSet() I would suggest you use Rprof() to find out where the bottlenecks arise, e.g. in the XML functions or in S4 code, or in your code

Re: [R] Parsing large XML documents in R - how to optimize the speed?

2012-08-11 Thread Erdal Karaca
If this is an option for you: An xml database can handle (very) huge xml files and let you query nodes very efficiently. Then, you could query the xml databse from R (using REST) to do your statistics. There are some open source xquery/xml databases available. 2012/8/11 Frederic Fournier

[R] Parsing large XML documents in R - how to optimize the speed?

2012-08-10 Thread Frederic Fournier
Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took

Re: [R] Parsing large XML documents in R - how to optimize the speed?

2012-08-10 Thread Martin Morgan
On 08/10/2012 03:46 PM, Frederic Fournier wrote: Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) I'm not 100% sure of