Dear all, for my diploma thesis I have to import huge XML-Files into R for statistical processing - huge means a size about 33 MB.
I'm using the XML-Package version 1.9 As far as reading the complete file into R via xmlTreeParse doesn't work or is too slow, I'm trying to use xmlEventParse but I got completely stuck. I have many different type of nodes + <configuration> - <Data> - <dataSets noOfDataSets="50000"> - <dataSet number="1"> - <measurements> - <measurement number="1"> - <MRType1> <date>21.04.2005</date> <time>10:00</time> <plotCode>1</plotCode> <collarCode /> <value>2,33</value> <depth /> </MRType1> </measurement> - <measurement number="2"> - <MRType1> <date>21.04.2005</date> <time>10:00</time> <plotCode>1</plotCode> <collarCode /> <value>2,33</value> <depth /> </Soilrespirationrate> <MRType2> ... + <personData> + <siteData> I only need the measurement/MRType1 nodes - how can I do this? Currently I am trying the following code: xmlEventParse("/input.xml", list(startElement=xtract.startElement, text=xtract.text), useTagName=TRUE, addContext = FALSE) xtract.startElement <- function(name,attr){ startElement.name <<- c(startElement.name,name) } xtract.text <- function(text) { startElement.value <<- c(startElement.value,text) } this only gives me two lists, one with the all node names (even the ones I dont need) and one with the values (also together with the ones I dont need) but I can't put things together this way. What I want is: No. Date Time Plotcode collarcode value depth 1 ... ... ... ... ... ... 2 ... ... ... ... ... ... Any help is really really appreciated. I tried the whole week, starting with xmlTreeParse whick works fine for files with 200 entries but for files with 50000 entries it keeps crashing my core 2 duo, 2.4 GHz machine. Thanks so much in advance! If you need any further information, code snippets or XML file details please do not hestitate to mail! Alex ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.