Circle 2 of 'The R Inferno' may help you.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

In particular, it has an example of how to do what
Duncan suggested.

Pat


On 21/12/2012 15:27, Peter Meißner wrote:
Here is an working example that reproduces the behavior by creating 1000
xml-files and afterwards parsing them.

At my PC, R starts with about 90MB of RAM with every cycle another
10-12MB are further added to the RAM-usage so I end up with 200MB RAM
usage.

In the real code one chunk-cycle eats about 800MB of RAM which was one
of the reasons I decided to splitt up the process in seperate chunks in
the first place.

----------------
'Minimal'Example - START
----------------

# the general problem
require(XML)

chunk <- function(x, chunksize){
             # source: http://stackoverflow.com/a/3321659/1144966
             x2 <- seq_along(x)
             split(x, ceiling(x2/chunksize))
         }



chunky <- chunk(paste("test",1:1000,".xml",sep=""),100)

for(i in 1:1000){
     writeLines(c(paste('<?xml version="1.0"?>\n <note>\n
<to>Tove</to>\n    <nr>',i,'</nr>\n    <from>Jani</from>\n
<heading>Reminder</heading>\n    ',sep=""), paste(rep('<body>Do not
forget me this weekend!</body>\n',sample(1:10, 1)),sep="" ) , ' </note>')
     ,paste("test",i,".xml",sep=""))
}

for(k in 1:length(chunky)){
     gc()
     print(chunky[[k]])
     xmlCatcher <- NULL

     for(i in 1:length(chunky[[k]])){
         filename    <- chunky[[k]][i]
         xml         <- xmlTreeParse(filename)
         xml         <- xmlRoot(xml)
         result      <- sapply(getNodeSet(xml,"//body"), xmlValue)
         id          <- sapply(getNodeSet(xml,"//nr"), xmlValue)
         dummy       <- cbind(id,result)
         xmlCatcher  <- rbind(xmlCatcher,dummy)
         }
     save(xmlCatcher,file=paste("xmlCatcher",k,".RData"))
}

----------------
'Minimal'Example - END
----------------



Am 21.12.2012 15:14, schrieb jim holtman:
Can you send either your actual script or the console output so I can
get an idea of how fast memory is growing.  Also at the end, can you
list the sizes of the objects in the workspace.  Here is a function I
use to get the space:

my.ls <-
function (pos = 1, sorted = FALSE, envir = as.environment(pos))
{
     .result <- sapply(ls(envir = envir, all.names = TRUE),
function(..x) object.size(eval(as.symbol(..x),
         envir = envir)))
     if (length(.result) == 0)
         return("No objects to list")
     if (sorted) {
         .result <- rev(sort(.result))
     }
     .ls <- as.data.frame(rbind(as.matrix(.result), `**Total` =
sum(.result)))
     names(.ls) <- "Size"
     .ls$Size <- formatC(.ls$Size, big.mark = ",", digits = 0,
         format = "f")
     .ls$Class <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
function(x) class(eval(as.symbol(x),
         envir = envir))[1L])), "-------")
     .ls$Length <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
         function(x) length(eval(as.symbol(x), envir = envir)))),
         "-------")
     .ls$Dim <- c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
paste(dim(eval(as.symbol(x),
         envir = envir)), collapse = " x "))), "-------")
     .ls
}


which gives output like this:

my.ls()
                  Size       Class  Length     Dim
.Last             736    function       1
.my.env.jph        28 environment      39
x                 424     integer     100
y              40,024     integer   10000
z           4,000,024     integer 1000000
**Total     4,041,236     ------- ------- -------


On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
<peter.meiss...@uni-konstanz.de> wrote:
Thanks for your answer,

yes, I tried 'gc()' it did not change the bahavior.

best, Peter


Am 21.12.2012 13:37, schrieb jim holtman:

have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
<peter.meiss...@uni-konstanz.de> wrote:

Hey,

I have an double loop like this:


chunk <- list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
          print(chunk[k])
          DummyCatcher <- NULL
          for(i in chunk[k]){
                  print("i load something")
                  dummy <- 1
                  print("i do something")
                  dummy <- dummy + 1
                  print("i do put it together")
                  DummyCatcher = rbind(DummyCatcher, dummy)
          }
          print("i save a chunk and restart with another chunk of
data")
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for
any of
the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note
that all
the
objects (like 'DummyCatcher') are reused every cycle so that I would
assume
that the RAM used should stay about the same after the first 'chunk'
cycle.


Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany

+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/





--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to