Re: [R] Memory filling up while looping
have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner peter.meiss...@uni-konstanz.de wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
Thanks for your answer, yes, I tried 'gc()' it did not change the bahavior. best, Peter Am 21.12.2012 13:37, schrieb jim holtman: have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner peter.meiss...@uni-konstanz.de wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Meißner Workgroup 'Comparative Parliamentary Politics' Department of Politics and Administration University of Konstanz Box 216 78457 Konstanz Germany +49 7531 88 5665 http://www.polver.uni-konstanz.de/sieberer/home/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
On 12-12-20 6:26 PM, Peter Meissner wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. You should pre-allocate your result matrix. By growing it a few rows at a time, R needs to do this: allocate it allocate a bigger one, copy the old one in delete the old one, leaving a small hole in memory allocate a bigger one, copy the old one in delete the old one, leaving a bigger hold in memory, but still too small to use... etc. If you are lucky, R might be able to combine some of those small holes into a bigger one and use that, but chances are other variables will have been created there in the meantime, so the holes will go mostly unused. R never moves an object during garbage collection, so if you have fragmented memory, it's mostly wasted. If you don't know how big the final result will be, then allocate large, and when you run out, allocate bigger. Not as good as one allocation, but better than hundreds. Duncan Murdoch Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
Here is an working example that reproduces the behavior by creating 1000 xml-files and afterwards parsing them. At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB are further added to the RAM-usage so I end up with 200MB RAM usage. In the real code one chunk-cycle eats about 800MB of RAM which was one of the reasons I decided to splitt up the process in seperate chunks in the first place. 'Minimal'Example - START # the general problem require(XML) chunk - function(x, chunksize){ # source: http://stackoverflow.com/a/3321659/1144966 x2 - seq_along(x) split(x, ceiling(x2/chunksize)) } chunky - chunk(paste(test,1:1000,.xml,sep=),100) for(i in 1:1000){ writeLines(c(paste('?xml version=1.0?\n note\n toTove/to\nnr',i,'/nr\nfromJani/from\n headingReminder/heading\n',sep=), paste(rep('bodyDo not forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note') ,paste(test,i,.xml,sep=)) } for(k in 1:length(chunky)){ gc() print(chunky[[k]]) xmlCatcher - NULL for(i in 1:length(chunky[[k]])){ filename- chunky[[k]][i] xml - xmlTreeParse(filename) xml - xmlRoot(xml) result - sapply(getNodeSet(xml,//body), xmlValue) id - sapply(getNodeSet(xml,//nr), xmlValue) dummy - cbind(id,result) xmlCatcher - rbind(xmlCatcher,dummy) } save(xmlCatcher,file=paste(xmlCatcher,k,.RData)) } 'Minimal'Example - END Am 21.12.2012 15:14, schrieb jim holtman: Can you send either your actual script or the console output so I can get an idea of how fast memory is growing. Also at the end, can you list the sizes of the objects in the workspace. Here is a function I use to get the space: my.ls - function (pos = 1, sorted = FALSE, envir = as.environment(pos)) { .result - sapply(ls(envir = envir, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x), envir = envir))) if (length(.result) == 0) return(No objects to list) if (sorted) { .result - rev(sort(.result)) } .ls - as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result))) names(.ls) - Size .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0, format = f) .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) class(eval(as.symbol(x), envir = envir))[1L])), ---) .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) length(eval(as.symbol(x), envir = envir, ---) .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) paste(dim(eval(as.symbol(x), envir = envir)), collapse = x ))), ---) .ls } which gives output like this: my.ls() Size Class Length Dim .Last 736function 1 .my.env.jph28 environment 39 x 424 integer 100 y 40,024 integer 1 z 4,000,024 integer 100 **Total 4,041,236 --- --- --- On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner peter.meiss...@uni-konstanz.de wrote: Thanks for your answer, yes, I tried 'gc()' it did not change the bahavior. best, Peter Am 21.12.2012 13:37, schrieb jim holtman: have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner peter.meiss...@uni-konstanz.de wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide
Re: [R] Memory filling up while looping
I'll consider it. But in fact the whole data does not fit into memory at once with the overhead to create it in addition - I think. That was one of the reasons I wanted to do it chunk by chunk in the first place. Thanks, Best, Peter Am 21.12.2012 15:07, schrieb Duncan Murdoch: On 12-12-20 6:26 PM, Peter Meissner wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. You should pre-allocate your result matrix. By growing it a few rows at a time, R needs to do this: allocate it allocate a bigger one, copy the old one in delete the old one, leaving a small hole in memory allocate a bigger one, copy the old one in delete the old one, leaving a bigger hold in memory, but still too small to use... etc. If you are lucky, R might be able to combine some of those small holes into a bigger one and use that, but chances are other variables will have been created there in the meantime, so the holes will go mostly unused. R never moves an object during garbage collection, so if you have fragmented memory, it's mostly wasted. If you don't know how big the final result will be, then allocate large, and when you run out, allocate bigger. Not as good as one allocation, but better than hundreds. Duncan Murdoch Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Meißner Workgroup 'Comparative Parliamentary Politics' Department of Politics and Administration University of Konstanz Box 216 78457 Konstanz Germany +49 7531 88 5665 http://www.polver.uni-konstanz.de/sieberer/home/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
Circle 2 of 'The R Inferno' may help you. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf In particular, it has an example of how to do what Duncan suggested. Pat On 21/12/2012 15:27, Peter Meißner wrote: Here is an working example that reproduces the behavior by creating 1000 xml-files and afterwards parsing them. At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB are further added to the RAM-usage so I end up with 200MB RAM usage. In the real code one chunk-cycle eats about 800MB of RAM which was one of the reasons I decided to splitt up the process in seperate chunks in the first place. 'Minimal'Example - START # the general problem require(XML) chunk - function(x, chunksize){ # source: http://stackoverflow.com/a/3321659/1144966 x2 - seq_along(x) split(x, ceiling(x2/chunksize)) } chunky - chunk(paste(test,1:1000,.xml,sep=),100) for(i in 1:1000){ writeLines(c(paste('?xml version=1.0?\n note\n toTove/to\nnr',i,'/nr\nfromJani/from\n headingReminder/heading\n',sep=), paste(rep('bodyDo not forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note') ,paste(test,i,.xml,sep=)) } for(k in 1:length(chunky)){ gc() print(chunky[[k]]) xmlCatcher - NULL for(i in 1:length(chunky[[k]])){ filename- chunky[[k]][i] xml - xmlTreeParse(filename) xml - xmlRoot(xml) result - sapply(getNodeSet(xml,//body), xmlValue) id - sapply(getNodeSet(xml,//nr), xmlValue) dummy - cbind(id,result) xmlCatcher - rbind(xmlCatcher,dummy) } save(xmlCatcher,file=paste(xmlCatcher,k,.RData)) } 'Minimal'Example - END Am 21.12.2012 15:14, schrieb jim holtman: Can you send either your actual script or the console output so I can get an idea of how fast memory is growing. Also at the end, can you list the sizes of the objects in the workspace. Here is a function I use to get the space: my.ls - function (pos = 1, sorted = FALSE, envir = as.environment(pos)) { .result - sapply(ls(envir = envir, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x), envir = envir))) if (length(.result) == 0) return(No objects to list) if (sorted) { .result - rev(sort(.result)) } .ls - as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result))) names(.ls) - Size .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0, format = f) .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) class(eval(as.symbol(x), envir = envir))[1L])), ---) .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) length(eval(as.symbol(x), envir = envir, ---) .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) paste(dim(eval(as.symbol(x), envir = envir)), collapse = x ))), ---) .ls } which gives output like this: my.ls() Size Class Length Dim .Last 736function 1 .my.env.jph28 environment 39 x 424 integer 100 y 40,024 integer 1 z 4,000,024 integer 100 **Total 4,041,236 --- --- --- On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner peter.meiss...@uni-konstanz.de wrote: Thanks for your answer, yes, I tried 'gc()' it did not change the bahavior. best, Peter Am 21.12.2012 13:37, schrieb jim holtman: have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner peter.meiss...@uni-konstanz.de wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM
Re: [R] Memory filling up while looping
Yeah, thanks, I know: !DO NOT USE RBIND! ! But it does not help, although using a predefined list to store results as suggested there, it does not help. The problems seems to stem from the XML-package and not from the way I store the data until saved. Best, Peter Am 21.12.2012 18:33, schrieb Patrick Burns: Circle 2 of 'The R Inferno' may help you. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf In particular, it has an example of how to do what Duncan suggested. Pat On 21/12/2012 15:27, Peter Meißner wrote: Here is an working example that reproduces the behavior by creating 1000 xml-files and afterwards parsing them. At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB are further added to the RAM-usage so I end up with 200MB RAM usage. In the real code one chunk-cycle eats about 800MB of RAM which was one of the reasons I decided to splitt up the process in seperate chunks in the first place. 'Minimal'Example - START # the general problem require(XML) chunk - function(x, chunksize){ # source: http://stackoverflow.com/a/3321659/1144966 x2 - seq_along(x) split(x, ceiling(x2/chunksize)) } chunky - chunk(paste(test,1:1000,.xml,sep=),100) for(i in 1:1000){ writeLines(c(paste('?xml version=1.0?\n note\n toTove/to\nnr',i,'/nr\nfromJani/from\n headingReminder/heading\n',sep=), paste(rep('bodyDo not forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note') ,paste(test,i,.xml,sep=)) } for(k in 1:length(chunky)){ gc() print(chunky[[k]]) xmlCatcher - NULL for(i in 1:length(chunky[[k]])){ filename- chunky[[k]][i] xml - xmlTreeParse(filename) xml - xmlRoot(xml) result - sapply(getNodeSet(xml,//body), xmlValue) id - sapply(getNodeSet(xml,//nr), xmlValue) dummy - cbind(id,result) xmlCatcher - rbind(xmlCatcher,dummy) } save(xmlCatcher,file=paste(xmlCatcher,k,.RData)) } 'Minimal'Example - END Am 21.12.2012 15:14, schrieb jim holtman: Can you send either your actual script or the console output so I can get an idea of how fast memory is growing. Also at the end, can you list the sizes of the objects in the workspace. Here is a function I use to get the space: my.ls - function (pos = 1, sorted = FALSE, envir = as.environment(pos)) { .result - sapply(ls(envir = envir, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x), envir = envir))) if (length(.result) == 0) return(No objects to list) if (sorted) { .result - rev(sort(.result)) } .ls - as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result))) names(.ls) - Size .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0, format = f) .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) class(eval(as.symbol(x), envir = envir))[1L])), ---) .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) length(eval(as.symbol(x), envir = envir, ---) .ls$Dim - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) paste(dim(eval(as.symbol(x), envir = envir)), collapse = x ))), ---) .ls } which gives output like this: my.ls() Size Class Length Dim .Last 736function 1 .my.env.jph28 environment 39 x 424 integer 100 y 40,024 integer 1 z 4,000,024 integer 100 **Total 4,041,236 --- --- --- On Fri, Dec 21, 2012 at 8:03 AM, Peter Meißner peter.meiss...@uni-konstanz.de wrote: Thanks for your answer, yes, I tried 'gc()' it did not change the bahavior. best, Peter Am 21.12.2012 13:37, schrieb jim holtman: have you tried putting calls to 'gc' at the top of the first loop to make sure memory is reclaimed? You can print the call to 'gc' to see how fast it is growing. On Thu, Dec 20, 2012 at 6:26 PM, Peter Meissner peter.meiss...@uni-konstanz.de wrote: Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this
Re: [R] Memory filling up while looping
Le vendredi 21 décembre 2012 à 18:41 +0100, Peter Meißner a écrit : Yeah, thanks, I know: !DO NOT USE RBIND! ! But it does not help, although using a predefined list to store results as suggested there, it does not help. The problems seems to stem from the XML-package and not from the way I store the data until saved. So you may want to use xmlParse() or the equivalent xmlTreeParse(useInternalNodes=TRUE) instead of plain xmlTreeParse(). This will avoid creating too many R objects that will need to be freed. But do not forget to call free() on the resulting object at the end of the loop. See this page for details: http://www.omegahat.org/RSXML/MemoryManagement.html Regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
THANKS a lot! This actually solved the problem even without calling free() explicitly: xmlTreeParse(..., useInternalNodes=TRUE) Best, Peter Am 21.12.2012 19:48, schrieb Milan Bouchet-Valat: Le vendredi 21 décembre 2012 à 18:41 +0100, Peter Meißner a écrit : Yeah, thanks, I know: !DO NOT USE RBIND! ! But it does not help, although using a predefined list to store results as suggested there, it does not help. The problems seems to stem from the XML-package and not from the way I store the data until saved. So you may want to use xmlParse() or the equivalent xmlTreeParse(useInternalNodes=TRUE) instead of plain xmlTreeParse(). This will avoid creating too many R objects that will need to be freed. But do not forget to call free() on the resulting object at the end of the loop. See this page for details: http://www.omegahat.org/RSXML/MemoryManagement.html Regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory filling up while looping
I ran your code and did not see any growth: used (Mb) gc trigger (Mb) max used (Mb) Ncells 463828 24.8 818163 43.7 818163 43.7 Vcells 546318 4.21031040 7.9 909905 7.0 1 (1) - eval : 33.6 376.6 376.6 : 48.9MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 471049 25.2 818163 43.7 818163 43.7 Vcells 544105 4.21031040 7.9 909905 7.0 2 (1) - eval : 35.9 379.2 379.2 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 479520 25.7 818163 43.7 818163 43.7 Vcells 543882 4.21031040 7.9 909905 7.0 3 (1) - eval : 38.0 381.4 381.4 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 488376 26.1 818163 43.7 818163 43.7 Vcells 544191 4.21031040 7.9 909905 7.0 4 (1) - eval : 40.0 383.4 383.4 : 48.8MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 496695 26.6 818163 43.7 818163 43.7 Vcells 543971 4.21031040 7.9 909905 7.0 5 (1) - eval : 42.0 385.4 385.4 : 48.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 505562 27.0 899071 48.1 818163 43.7 Vcells 544034 4.21031040 7.9 909905 7.0 6 (1) - eval : 44.1 387.5 387.5 : 48.8MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 513896 27.5 899071 48.1 899071 48.1 Vcells 543973 4.21031040 7.9 909905 7.0 7 (1) - eval : 46.2 389.8 389.8 : 52.5MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 523203 28.0 899071 48.1 899071 48.1 Vcells 544751 4.21031040 7.9 909905 7.0 8 (1) - eval : 48.5 392.2 392.2 : 46.7MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 531519 28.4 899071 48.1 899071 48.1 Vcells 544418 4.21031040 7.9 909905 7.0 9 (1) - eval : 50.6 394.5 394.5 : 47.3MB used (Mb) gc trigger (Mb) max used (Mb) Ncells 539556 28.9 899071 48.1 899071 48.1 Vcells 544057 4.21031040 7.9 909905 7.0 10 (1) - eval : 52.6 396.6 396.6 : 47.8MB started out with 48M and ended with 47M. This is with R version 2.15.2 (2012-10-26) -- Trick or Treat Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-w64-mingw32/x64 (64-bit) On Fri, Dec 21, 2012 at 10:27 AM, Peter Meißner peter.meiss...@uni-konstanz.de wrote: Here is an working example that reproduces the behavior by creating 1000 xml-files and afterwards parsing them. At my PC, R starts with about 90MB of RAM with every cycle another 10-12MB are further added to the RAM-usage so I end up with 200MB RAM usage. In the real code one chunk-cycle eats about 800MB of RAM which was one of the reasons I decided to splitt up the process in seperate chunks in the first place. 'Minimal'Example - START # the general problem require(XML) chunk - function(x, chunksize){ # source: http://stackoverflow.com/a/3321659/1144966 x2 - seq_along(x) split(x, ceiling(x2/chunksize)) } chunky - chunk(paste(test,1:1000,.xml,sep=),100) for(i in 1:1000){ writeLines(c(paste('?xml version=1.0?\n note\n toTove/to\n nr',i,'/nr\nfromJani/from\n headingReminder/heading\n ',sep=), paste(rep('bodyDo not forget me this weekend!/body\n',sample(1:10, 1)),sep= ) , ' /note') ,paste(test,i,.xml,sep=)) } for(k in 1:length(chunky)){ gc() print(chunky[[k]]) xmlCatcher - NULL for(i in 1:length(chunky[[k]])){ filename- chunky[[k]][i] xml - xmlTreeParse(filename) xml - xmlRoot(xml) result - sapply(getNodeSet(xml,//body), xmlValue) id - sapply(getNodeSet(xml,//nr), xmlValue) dummy - cbind(id,result) xmlCatcher - rbind(xmlCatcher,dummy) } save(xmlCatcher,file=paste(xmlCatcher,k,.RData)) } 'Minimal'Example - END Am 21.12.2012 15:14, schrieb jim holtman: Can you send either your actual script or the console output so I can get an idea of how fast memory is growing. Also at the end, can you list the sizes of the objects in the workspace. Here is a function I use to get the space: my.ls - function (pos = 1, sorted = FALSE, envir = as.environment(pos)) { .result - sapply(ls(envir = envir, all.names = TRUE), function(..x) object.size(eval(as.symbol(..x), envir = envir))) if (length(.result) == 0) return(No objects to list) if (sorted) { .result - rev(sort(.result)) } .ls - as.data.frame(rbind(as.matrix(.result), `**Total` = sum(.result))) names(.ls) - Size .ls$Size - formatC(.ls$Size, big.mark = ,, digits = 0, format = f) .ls$Class - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) class(eval(as.symbol(x), envir = envir))[1L])), ---) .ls$Length - c(unlist(lapply(rownames(.ls)[-nrow(.ls)], function(x) length(eval(as.symbol(x), envir = envir, ---) .ls$Dim -
[R] Memory filling up while looping
Hey, I have an double loop like this: chunk - list(1:10, 11:20, 21:30) for(k in 1:length(chunk)){ print(chunk[k]) DummyCatcher - NULL for(i in chunk[k]){ print(i load something) dummy - 1 print(i do something) dummy - dummy + 1 print(i do put it together) DummyCatcher = rbind(DummyCatcher, dummy) } print(i save a chunk and restart with another chunk of data) } The problem now is that with each 'chunk'-cycle the memory used by R becomes bigger and bigger until it exceeds my RAM but the RAM it needs for any of the chunk-cycles alone is only a 1/5th of what I have overall. Does somebody have an idea why this behaviour might occur? Note that all the objects (like 'DummyCatcher') are reused every cycle so that I would assume that the RAM used should stay about the same after the first 'chunk' cycle. Best, Peter SystemInfo: R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Win7 Enterprise, 8 GB RAM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.