Re: [R] Memory filling up while looping

2012-12-21 Thread jim holtman
have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
peter.meiss...@uni-konstanz.de wrote:
 Hey,

 I have an double loop like this:


 chunk - list(1:10, 11:20, 21:30)
 for(k in 1:length(chunk)){
 print(chunk[k])
 DummyCatcher - NULL
 for(i in chunk[k]){
 print(i load something)
 dummy - 1
 print(i do something)
 dummy - dummy + 1
 print(i do put it together)
 DummyCatcher = rbind(DummyCatcher, dummy)
 }
 print(i save a chunk and restart with another chunk of data)
 }

 The problem now is that with each 'chunk'-cycle the memory used by R becomes
 bigger and bigger until it exceeds my RAM but the RAM it needs for any of
 the chunk-cycles alone is only a 1/5th of what I have overall.

 Does somebody have an idea why this behaviour might occur? Note that all the
 objects (like 'DummyCatcher') are reused every cycle so that I would assume
 that the RAM used should stay about the same after the first 'chunk' cycle.


 Best, Peter


 SystemInfo:

 R version 2.15.2 (2012-10-26)
 Platform: x86_64-w64-mingw32/x64 (64-bit)
 Win7 Enterprise, 8 GB RAM

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread Peter Meißner

Thanks for your answer,

yes, I tried 'gc()' it did not change the bahavior.

best, Peter


Am 21.12.2012 13:37, schrieb jim holtman:

have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
peter.meiss...@uni-konstanz.de wrote:

Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
 print(chunk[k])
 DummyCatcher - NULL
 for(i in chunk[k]){
 print(i load something)
 dummy - 1
 print(i do something)
 dummy - dummy + 1
 print(i do put it together)
 DummyCatcher = rbind(DummyCatcher, dummy)
 }
 print(i save a chunk and restart with another chunk of data)
}

The problem now is that with each 'chunk'-cycle the memory used by R becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for any of
the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note that all the
objects (like 'DummyCatcher') are reused every cycle so that I would assume
that the RAM used should stay about the same after the first 'chunk' cycle.


Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany

+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread Duncan Murdoch

On 12-12-20 6:26 PM, Peter Meissner wrote:

Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher - NULL
for(i in chunk[k]){
print(i load something)
dummy - 1
print(i do something)
dummy - dummy + 1
print(i do put it together)
DummyCatcher = rbind(DummyCatcher, dummy)
}
print(i save a chunk and restart with another chunk of data)
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes bigger and bigger until it exceeds my RAM but the RAM it needs
for any of the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note that all
the objects (like 'DummyCatcher') are reused every cycle so that I would
assume that the RAM used should stay about the same after the first
'chunk' cycle.


You should pre-allocate your result matrix.  By growing it a few rows at 
a time, R needs to do this:


allocate it
allocate a bigger one, copy the old one in
delete the old one, leaving a small hole in memory
allocate a bigger one, copy the old one in
delete the old one, leaving a bigger hold in memory, but still too small 
to use...


etc.

If you are lucky, R might be able to combine some of those small holes 
into a bigger one and use that, but chances are other variables will 
have been created there in the meantime, so the holes will go mostly 
unused.  R never moves an object during garbage collection, so if you 
have fragmented memory, it's mostly wasted.


If you don't know how big the final result will be, then allocate large, 
and when you run out, allocate bigger.  Not as good as one allocation, 
but better than hundreds.


Duncan Murdoch




Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread Peter Meißner
Here is an working example that reproduces the behavior by creating 1000 
xml-files and afterwards parsing them.


At my PC, R starts with about 90MB of RAM with every cycle another 
10-12MB are further added to the RAM-usage so I end up with 200MB RAM 
usage.


In the real code one chunk-cycle eats about 800MB of RAM which was one 
of the reasons I decided to splitt up the process in seperate chunks in 
the first place.



'Minimal'Example - START


# the general problem
require(XML)

chunk - function(x, chunksize){
# source: http://stackoverflow.com/a/3321659/1144966
x2 - seq_along(x)
split(x, ceiling(x2/chunksize))
}



chunky - chunk(paste(test,1:1000,.xml,sep=),100)

for(i in 1:1000){
writeLines(c(paste('?xml version=1.0?\n note\n 
toTove/to\nnr',i,'/nr\nfromJani/from\n 
headingReminder/heading\n',sep=), paste(rep('bodyDo not 
forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note')

,paste(test,i,.xml,sep=))
}

for(k in 1:length(chunky)){
gc()
print(chunky[[k]])
xmlCatcher - NULL

for(i in 1:length(chunky[[k]])){
filename- chunky[[k]][i]
xml - xmlTreeParse(filename)
xml - xmlRoot(xml)
result  - sapply(getNodeSet(xml,//body), xmlValue)
id  - sapply(getNodeSet(xml,//nr), xmlValue)
dummy   - cbind(id,result)
xmlCatcher  - rbind(xmlCatcher,dummy)
}
save(xmlCatcher,file=paste(xmlCatcher,k,.RData))
}


'Minimal'Example - END




Am 21.12.2012 15:14, schrieb jim holtman:

Can you send either your actual script or the console output so I can
get an idea of how fast memory is growing.  Also at the end, can you
list the sizes of the objects in the workspace.  Here is a function I
use to get the space:

my.ls -
function (pos = 1, sorted = FALSE, envir = as.environment(pos))
{
 .result - sapply(ls(envir = envir, all.names = TRUE),
function(..x) object.size(eval(as.symbol(..x),
 envir = envir)))
 if (length(.result) == 0)
 return(No objects to list)
 if (sorted) {
 .result - rev(sort(.result))
 }
 .ls - as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result)))
 names(.ls) - Size
 .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0,
 format = f)
 .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
function(x) class(eval(as.symbol(x),
 envir = envir))[1L])), ---)
 .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
 function(x) length(eval(as.symbol(x), envir = envir,
 ---)
 .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
paste(dim(eval(as.symbol(x),
 envir = envir)), collapse =  x ))), ---)
 .ls
}


which gives output like this:


my.ls()

  Size   Class  Length Dim
.Last 736function   1
.my.env.jph28 environment  39
x 424 integer 100
y  40,024 integer   1
z   4,000,024 integer 100
**Total 4,041,236 --- --- ---


On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
peter.meiss...@uni-konstanz.de wrote:

Thanks for your answer,

yes, I tried 'gc()' it did not change the bahavior.

best, Peter


Am 21.12.2012 13:37, schrieb jim holtman:


have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
peter.meiss...@uni-konstanz.de wrote:


Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
  print(chunk[k])
  DummyCatcher - NULL
  for(i in chunk[k]){
  print(i load something)
  dummy - 1
  print(i do something)
  dummy - dummy + 1
  print(i do put it together)
  DummyCatcher = rbind(DummyCatcher, dummy)
  }
  print(i save a chunk and restart with another chunk of data)
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for any of
the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note that all
the
objects (like 'DummyCatcher') are reused every cycle so that I would
assume
that the RAM used should stay about the same after the first 'chunk'
cycle.


Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide 

Re: [R] Memory filling up while looping

2012-12-21 Thread Peter Meißner
I'll consider it. But in fact the whole data does not fit into memory at 
once with the overhead to create it in addition - I think. That was one 
of the reasons I wanted to do it chunk by chunk in the first place.


Thanks, Best, Peter

Am 21.12.2012 15:07, schrieb Duncan Murdoch:

On 12-12-20 6:26 PM, Peter Meissner wrote:

Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher - NULL
for(i in chunk[k]){
print(i load something)
dummy - 1
print(i do something)
dummy - dummy + 1
print(i do put it together)
DummyCatcher = rbind(DummyCatcher, dummy)
}
print(i save a chunk and restart with another chunk of data)
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes bigger and bigger until it exceeds my RAM but the RAM it needs
for any of the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note that all
the objects (like 'DummyCatcher') are reused every cycle so that I would
assume that the RAM used should stay about the same after the first
'chunk' cycle.


You should pre-allocate your result matrix.  By growing it a few rows at
a time, R needs to do this:

allocate it
allocate a bigger one, copy the old one in
delete the old one, leaving a small hole in memory
allocate a bigger one, copy the old one in
delete the old one, leaving a bigger hold in memory, but still too small
to use...

etc.

If you are lucky, R might be able to combine some of those small holes
into a bigger one and use that, but chances are other variables will
have been created there in the meantime, so the holes will go mostly
unused.  R never moves an object during garbage collection, so if you
have fragmented memory, it's mostly wasted.

If you don't know how big the final result will be, then allocate large,
and when you run out, allocate bigger.  Not as good as one allocation,
but better than hundreds.

Duncan Murdoch




Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Peter Meißner
Workgroup 'Comparative Parliamentary Politics'
Department of Politics and Administration
University of Konstanz
Box 216
78457 Konstanz
Germany

+49 7531 88 5665
http://www.polver.uni-konstanz.de/sieberer/home/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread Patrick Burns

Circle 2 of 'The R Inferno' may help you.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

In particular, it has an example of how to do what
Duncan suggested.

Pat


On 21/12/2012 15:27, Peter Meißner wrote:

Here is an working example that reproduces the behavior by creating 1000
xml-files and afterwards parsing them.

At my PC, R starts with about 90MB of RAM with every cycle another
10-12MB are further added to the RAM-usage so I end up with 200MB RAM
usage.

In the real code one chunk-cycle eats about 800MB of RAM which was one
of the reasons I decided to splitt up the process in seperate chunks in
the first place.


'Minimal'Example - START


# the general problem
require(XML)

chunk - function(x, chunksize){
 # source: http://stackoverflow.com/a/3321659/1144966
 x2 - seq_along(x)
 split(x, ceiling(x2/chunksize))
 }



chunky - chunk(paste(test,1:1000,.xml,sep=),100)

for(i in 1:1000){
 writeLines(c(paste('?xml version=1.0?\n note\n
toTove/to\nnr',i,'/nr\nfromJani/from\n
headingReminder/heading\n',sep=), paste(rep('bodyDo not
forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note')
 ,paste(test,i,.xml,sep=))
}

for(k in 1:length(chunky)){
 gc()
 print(chunky[[k]])
 xmlCatcher - NULL

 for(i in 1:length(chunky[[k]])){
 filename- chunky[[k]][i]
 xml - xmlTreeParse(filename)
 xml - xmlRoot(xml)
 result  - sapply(getNodeSet(xml,//body), xmlValue)
 id  - sapply(getNodeSet(xml,//nr), xmlValue)
 dummy   - cbind(id,result)
 xmlCatcher  - rbind(xmlCatcher,dummy)
 }
 save(xmlCatcher,file=paste(xmlCatcher,k,.RData))
}


'Minimal'Example - END




Am 21.12.2012 15:14, schrieb jim holtman:

Can you send either your actual script or the console output so I can
get an idea of how fast memory is growing.  Also at the end, can you
list the sizes of the objects in the workspace.  Here is a function I
use to get the space:

my.ls -
function (pos = 1, sorted = FALSE, envir = as.environment(pos))
{
 .result - sapply(ls(envir = envir, all.names = TRUE),
function(..x) object.size(eval(as.symbol(..x),
 envir = envir)))
 if (length(.result) == 0)
 return(No objects to list)
 if (sorted) {
 .result - rev(sort(.result))
 }
 .ls - as.data.frame(rbind(as.matrix(.result), `**Total` =
sum(.result)))
 names(.ls) - Size
 .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0,
 format = f)
 .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
function(x) class(eval(as.symbol(x),
 envir = envir))[1L])), ---)
 .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
 function(x) length(eval(as.symbol(x), envir = envir,
 ---)
 .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
paste(dim(eval(as.symbol(x),
 envir = envir)), collapse =  x ))), ---)
 .ls
}


which gives output like this:


my.ls()

  Size   Class  Length Dim
.Last 736function   1
.my.env.jph28 environment  39
x 424 integer 100
y  40,024 integer   1
z   4,000,024 integer 100
**Total 4,041,236 --- --- ---


On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
peter.meiss...@uni-konstanz.de wrote:

Thanks for your answer,

yes, I tried 'gc()' it did not change the bahavior.

best, Peter


Am 21.12.2012 13:37, schrieb jim holtman:


have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
peter.meiss...@uni-konstanz.de wrote:


Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
  print(chunk[k])
  DummyCatcher - NULL
  for(i in chunk[k]){
  print(i load something)
  dummy - 1
  print(i do something)
  dummy - dummy + 1
  print(i do put it together)
  DummyCatcher = rbind(DummyCatcher, dummy)
  }
  print(i save a chunk and restart with another chunk of
data)
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for
any of
the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this behaviour might occur? Note
that all
the
objects (like 'DummyCatcher') are reused every cycle so that I would
assume
that the RAM used should stay about the same after the first 'chunk'
cycle.


Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

Re: [R] Memory filling up while looping

2012-12-21 Thread Peter Meißner

Yeah, thanks,
I know: !DO NOT USE RBIND! !

But it does not help, although using a predefined list to store results 
as suggested there, it does not help.


The problems seems to stem from the XML-package and not from the way I 
store the data until saved.


Best, Peter



Am 21.12.2012 18:33, schrieb Patrick Burns:

Circle 2 of 'The R Inferno' may help you.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

In particular, it has an example of how to do what
Duncan suggested.

Pat


On 21/12/2012 15:27, Peter Meißner wrote:

Here is an working example that reproduces the behavior by creating 1000
xml-files and afterwards parsing them.

At my PC, R starts with about 90MB of RAM with every cycle another
10-12MB are further added to the RAM-usage so I end up with 200MB RAM
usage.

In the real code one chunk-cycle eats about 800MB of RAM which was one
of the reasons I decided to splitt up the process in seperate chunks in
the first place.


'Minimal'Example - START


# the general problem
require(XML)

chunk - function(x, chunksize){
 # source: http://stackoverflow.com/a/3321659/1144966
 x2 - seq_along(x)
 split(x, ceiling(x2/chunksize))
 }



chunky - chunk(paste(test,1:1000,.xml,sep=),100)

for(i in 1:1000){
 writeLines(c(paste('?xml version=1.0?\n note\n
toTove/to\nnr',i,'/nr\nfromJani/from\n
headingReminder/heading\n',sep=), paste(rep('bodyDo not
forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note')
 ,paste(test,i,.xml,sep=))
}

for(k in 1:length(chunky)){
 gc()
 print(chunky[[k]])
 xmlCatcher - NULL

 for(i in 1:length(chunky[[k]])){
 filename- chunky[[k]][i]
 xml - xmlTreeParse(filename)
 xml - xmlRoot(xml)
 result  - sapply(getNodeSet(xml,//body), xmlValue)
 id  - sapply(getNodeSet(xml,//nr), xmlValue)
 dummy   - cbind(id,result)
 xmlCatcher  - rbind(xmlCatcher,dummy)
 }
 save(xmlCatcher,file=paste(xmlCatcher,k,.RData))
}


'Minimal'Example - END




Am 21.12.2012 15:14, schrieb jim holtman:

Can you send either your actual script or the console output so I can
get an idea of how fast memory is growing.  Also at the end, can you
list the sizes of the objects in the workspace.  Here is a function I
use to get the space:

my.ls -
function (pos = 1, sorted = FALSE, envir = as.environment(pos))
{
 .result - sapply(ls(envir = envir, all.names = TRUE),
function(..x) object.size(eval(as.symbol(..x),
 envir = envir)))
 if (length(.result) == 0)
 return(No objects to list)
 if (sorted) {
 .result - rev(sort(.result))
 }
 .ls - as.data.frame(rbind(as.matrix(.result), `**Total` =
sum(.result)))
 names(.ls) - Size
 .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0,
 format = f)
 .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
function(x) class(eval(as.symbol(x),
 envir = envir))[1L])), ---)
 .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
 function(x) length(eval(as.symbol(x), envir = envir,
 ---)
 .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x)
paste(dim(eval(as.symbol(x),
 envir = envir)), collapse =  x ))), ---)
 .ls
}


which gives output like this:


my.ls()

  Size   Class  Length Dim
.Last 736function   1
.my.env.jph28 environment  39
x 424 integer 100
y  40,024 integer   1
z   4,000,024 integer 100
**Total 4,041,236 --- --- ---


On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner
peter.meiss...@uni-konstanz.de wrote:

Thanks for your answer,

yes, I tried 'gc()' it did not change the bahavior.

best, Peter


Am 21.12.2012 13:37, schrieb jim holtman:


have you tried putting calls to 'gc' at the top of the first loop to
make sure memory is reclaimed? You can print the call to 'gc' to see
how fast it is growing.

On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner
peter.meiss...@uni-konstanz.de wrote:


Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
  print(chunk[k])
  DummyCatcher - NULL
  for(i in chunk[k]){
  print(i load something)
  dummy - 1
  print(i do something)
  dummy - dummy + 1
  print(i do put it together)
  DummyCatcher = rbind(DummyCatcher, dummy)
  }
  print(i save a chunk and restart with another chunk of
data)
}

The problem now is that with each 'chunk'-cycle the memory used by R
becomes
bigger and bigger until it exceeds my RAM but the RAM it needs for
any of
the chunk-cycles alone is only a 1/5th of what I have overall.

Does somebody have an idea why this 

Re: [R] Memory filling up while looping

2012-12-21 Thread Milan Bouchet-Valat
Le vendredi 21 décembre 2012 à 18:41 +0100, Peter Meißner a écrit :
 Yeah, thanks,
 I know: !DO NOT USE RBIND! !
 
 But it does not help, although using a predefined list to store results 
 as suggested there, it does not help.
 
 The problems seems to stem from the XML-package and not from the way I 
 store the data until saved.
So you may want to use xmlParse() or the equivalent
xmlTreeParse(useInternalNodes=TRUE) instead of plain xmlTreeParse().
This will avoid creating too many R objects that will need to be freed.
But do not forget to call free() on the resulting object at the end of
the loop.

See this page for details:
http://www.omegahat.org/RSXML/MemoryManagement.html


Regards

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread Peter Meißner

THANKS a lot!

This actually solved the problem even without calling free() explicitly:

xmlTreeParse(..., useInternalNodes=TRUE)

Best, Peter


Am 21.12.2012 19:48, schrieb Milan Bouchet-Valat:

Le vendredi 21 décembre 2012 à 18:41 +0100, Peter Meißner a écrit :

Yeah, thanks,
I know: !DO NOT USE RBIND! !

But it does not help, although using a predefined list to store results
as suggested there, it does not help.

The problems seems to stem from the XML-package and not from the way I
store the data until saved.

So you may want to use xmlParse() or the equivalent
xmlTreeParse(useInternalNodes=TRUE) instead of plain xmlTreeParse().
This will avoid creating too many R objects that will need to be freed.
But do not forget to call free() on the resulting object at the end of
the loop.

See this page for details:
http://www.omegahat.org/RSXML/MemoryManagement.html


Regards



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory filling up while looping

2012-12-21 Thread jim holtman
I ran your code and did not see any growth:

 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 463828 24.8 818163 43.7   818163 43.7
Vcells 546318  4.21031040  7.9   909905  7.0
1 (1) - eval : 33.6 376.6 376.6 : 48.9MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 471049 25.2 818163 43.7   818163 43.7
Vcells 544105  4.21031040  7.9   909905  7.0
2 (1) - eval : 35.9 379.2 379.2 : 48.7MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 479520 25.7 818163 43.7   818163 43.7
Vcells 543882  4.21031040  7.9   909905  7.0
3 (1) - eval : 38.0 381.4 381.4 : 48.7MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 488376 26.1 818163 43.7   818163 43.7
Vcells 544191  4.21031040  7.9   909905  7.0
4 (1) - eval : 40.0 383.4 383.4 : 48.8MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 496695 26.6 818163 43.7   818163 43.7
Vcells 543971  4.21031040  7.9   909905  7.0
5 (1) - eval : 42.0 385.4 385.4 : 48.7MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 505562 27.0 899071 48.1   818163 43.7
Vcells 544034  4.21031040  7.9   909905  7.0
6 (1) - eval : 44.1 387.5 387.5 : 48.8MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 513896 27.5 899071 48.1   899071 48.1
Vcells 543973  4.21031040  7.9   909905  7.0
7 (1) - eval : 46.2 389.8 389.8 : 52.5MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 523203 28.0 899071 48.1   899071 48.1
Vcells 544751  4.21031040  7.9   909905  7.0
8 (1) - eval : 48.5 392.2 392.2 : 46.7MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 531519 28.4 899071 48.1   899071 48.1
Vcells 544418  4.21031040  7.9   909905  7.0
9 (1) - eval : 50.6 394.5 394.5 : 47.3MB
 used (Mb) gc trigger (Mb) max used (Mb)
Ncells 539556 28.9 899071 48.1   899071 48.1
Vcells 544057  4.21031040  7.9   909905  7.0
10 (1) - eval : 52.6 396.6 396.6 : 47.8MB

started out with 48M and ended with 47M.  This is with

R version 2.15.2 (2012-10-26) -- Trick or Treat
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-w64-mingw32/x64 (64-bit)


On Fri, Dec 21, 2012 at 10:27 AM, Peter Meißner
peter.meiss...@uni-konstanz.de wrote:
 Here is an working example that reproduces the behavior by creating 1000
 xml-files and afterwards parsing them.

 At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB
 are further added to the RAM-usage so I end up with 200MB RAM usage.

 In the real code one chunk-cycle eats about 800MB of RAM which was one of
 the reasons I decided to splitt up the process in seperate chunks in the
 first place.

 
 'Minimal'Example - START
 

 # the general problem
 require(XML)

 chunk - function(x, chunksize){
 # source: http://stackoverflow.com/a/3321659/1144966
 x2 - seq_along(x)
 split(x, ceiling(x2/chunksize))
 }



 chunky - chunk(paste(test,1:1000,.xml,sep=),100)

 for(i in 1:1000){
 writeLines(c(paste('?xml version=1.0?\n note\n toTove/to\n
 nr',i,'/nr\nfromJani/from\n headingReminder/heading\n
 ',sep=), paste(rep('bodyDo not forget me this
 weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note')
 ,paste(test,i,.xml,sep=))
 }

 for(k in 1:length(chunky)){
 gc()
 print(chunky[[k]])
 xmlCatcher - NULL

 for(i in 1:length(chunky[[k]])){
 filename- chunky[[k]][i]
 xml - xmlTreeParse(filename)
 xml - xmlRoot(xml)
 result  - sapply(getNodeSet(xml,//body), xmlValue)
 id  - sapply(getNodeSet(xml,//nr), xmlValue)
 dummy   - cbind(id,result)
 xmlCatcher  - rbind(xmlCatcher,dummy)
 }
 save(xmlCatcher,file=paste(xmlCatcher,k,.RData))
 }

 
 'Minimal'Example - END
 



 Am 21.12.2012 15:14, schrieb jim holtman:

 Can you send either your actual script or the console output so I can
 get an idea of how fast memory is growing.  Also at the end, can you
 list the sizes of the objects in the workspace.  Here is a function I
 use to get the space:

 my.ls -
 function (pos = 1, sorted = FALSE, envir = as.environment(pos))
 {
  .result - sapply(ls(envir = envir, all.names = TRUE),
 function(..x) object.size(eval(as.symbol(..x),
  envir = envir)))
  if (length(.result) == 0)
  return(No objects to list)
  if (sorted) {
  .result - rev(sort(.result))
  }
  .ls - as.data.frame(rbind(as.matrix(.result), `**Total` =
 sum(.result)))
  names(.ls) - Size
  .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0,
  format = f)
  .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
 function(x) class(eval(as.symbol(x),
  envir = envir))[1L])), ---)
  .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)],
  function(x) length(eval(as.symbol(x), envir = envir,
  ---)
  .ls$Dim - 

[R] Memory filling up while looping

2012-12-20 Thread Peter Meissner

Hey,

I have an double loop like this:


chunk - list(1:10, 11:20, 21:30)
for(k in 1:length(chunk)){
print(chunk[k])
DummyCatcher - NULL
for(i in chunk[k]){
print(i load something)
dummy - 1
print(i do something)
dummy - dummy + 1
print(i do put it together)
DummyCatcher = rbind(DummyCatcher, dummy)
}
print(i save a chunk and restart with another chunk of data)
}

The problem now is that with each 'chunk'-cycle the memory used by R 
becomes bigger and bigger until it exceeds my RAM but the RAM it needs 
for any of the chunk-cycles alone is only a 1/5th of what I have overall.


Does somebody have an idea why this behaviour might occur? Note that all 
the objects (like 'DummyCatcher') are reused every cycle so that I would 
assume that the RAM used should stay about the same after the first 
'chunk' cycle.



Best, Peter


SystemInfo:

R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Win7 Enterprise, 8 GB RAM

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.