Dear all,

for my diploma thesis I have to import huge XML-Files into R for  
statistical processing - huge means a size about 33 MB.

I'm using the XML-Package version 1.9

As far as reading the complete file into R via xmlTreeParse doesn't  
work or is too slow, I'm trying to use xmlEventParse but I got  
completely stuck.

I have many different type of nodes

+ <configuration>

- <Data>
  - <dataSets noOfDataSets="50000">
   - <dataSet number="1">
    - <measurements>
     - <measurement number="1">
      - <MRType1>
         <date>21.04.2005</date>
         <time>10:00</time>
         <plotCode>1</plotCode>
         <collarCode />
         <value>2,33</value>
         <depth />
        </MRType1>
       </measurement>
     - <measurement number="2">
      - <MRType1>
         <date>21.04.2005</date>
         <time>10:00</time>
         <plotCode>1</plotCode>
         <collarCode />
         <value>2,33</value>
         <depth />
        </Soilrespirationrate>
       <MRType2>
        ...
+ <personData>
+ <siteData>

I only need the measurement/MRType1 nodes - how can I do this?  
Currently I am trying the following code:

xmlEventParse("/input.xml", list(startElement=xtract.startElement,  
text=xtract.text), useTagName=TRUE, addContext = FALSE)

xtract.startElement <- function(name,attr){
        startElement.name <<- c(startElement.name,name)
        }

xtract.text <- function(text) {
        startElement.value <<- c(startElement.value,text)
        }

this only gives me two lists, one with the all node names (even the  
ones I dont need) and one with the values (also together with the  
ones I dont need) but I can't put things together this way.

What I want is:

No.     Date    Time    Plotcode        collarcode      value   depth
1       ...     ...     ...             ...             ...     ...
2       ...     ...     ...             ...             ...     ...

Any help is really really appreciated. I tried the whole week,  
starting with xmlTreeParse whick works fine for files with 200  
entries but for files with 50000 entries it keeps crashing my core 2  
duo, 2.4 GHz machine.

Thanks so much in advance! If you need any further information, code  
snippets or XML file details please do not hestitate to mail!

Alex

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to