For the last case with the list: > x <- 1:2; y = list(x)[rep(1, 4)] > .Internal(inspect(y)) @102bbe090 19 VECSXP g0c3 [MARK,NAM(2)] (len=4, tl=0) @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 @106119628 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 > y[[1]][1] <- 2L # everybody copied > .Internal(inspect(y)) @102fca698 19 VECSXP g0c3 [NAM(1)] (len=4, tl=0) @1061196b8 13 INTSXP g0c1 [] (len=2, tl=0) 2,2 @106119688 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 @106119658 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 @106119718 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 > y1 <- y[[1]]; y1[1] <- 3L; y[[1]] <- y1 # only one copied > .Internal(inspect(y)) @102fca698 19 VECSXP g0c3 [MARK,NAM(1)] (len=4, tl=0) @10610b7a8 13 INTSXP g0c1 [MARK] (len=2, tl=0) 3,2 @106119688 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 @106119658 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2 @106119718 13 INTSXP g0c1 [MARK] (len=2, tl=0) 1,2
Assignment to "double subset" of a list seems to trigger full copy of the list, but `[[<-` alone appears smart enough to avoid copying the other elements of the list. Best, Philippe ..............................................<°}))><........ ) ) ) ) ) ( ( ( ( ( Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( ( Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons University, Belgium ( ( ( ( ( .............................................................. On 29 Jan 2014, at 00:53, Ross Boylan <r...@biostat.ucsf.edu> wrote: > Thank you for a very thorough analysis. It seems whether or not an > operation makes a full copy really depends on the specific operation, > and that it is not safe to assume that because I know something is > unchanged there will be no copy. For example, in your last case only > one element of a list was modified, but all the list elements got new > memory. > > BTW, one reason I got into this, aside from wanting to save memory, is > that I found my code was spending a lot of time in areas that probably > involved getting new memory. So it mattered for speed too. > > Ross > > On Mon, 2014-01-27 at 06:33 -0800, Martin Morgan wrote: >> Hi Ross -- >> >> On 01/23/2014 05:53 PM, Ross Boylan wrote: >>> [Apologies if a duplicate; we are having mail problems.] >>> >>> I am trying to understand the circumstances under which R makes a copy >>> of an object, as opposed to simply referring to it. I'm talking about >>> what goes on under the hood, not the user semantics. I'm doing things >>> that take a lot of memory, and am trying to minimize my use. >>> >>> I thought that R was clever so that copies were created lazily. For >>> example, if a is matrix, then >>> b <- a >>> b & a referred to to the same object underneath, so that a complete >>> duplicate (deep copy) wasn't made until it was necessary, e.g., >>> b[3, 1] <- 4 >>> would duplicate the contents of a to b, and then overwrite them. >> >> Compiling your R with --enable-memory-profiling gives access to the >> tracemem() >> function, showing that your understanding above is correct >> >>> b = matrix(0, 3, 2) >>> tracemem(b) >> [1] "<0x7054020>" >>> a = b ## no copy >>> b[3, 1] = 2 ## copy >> tracemem[0x7054020 -> 0x7053fc8]: >>> b = matrix(0, 3, 2) >>> tracemem(b) >>> tracemem(b) >> [1] "<0x680e258>" >>> b[3, 1] = 2 ## no copy >>> >> >> The same is apparent using .Internal(inspect()), where the first information >> @7053ec0 is the address of the data. The other relevant part is the 'NAM()' >> field, which indicates whether there are 0, 1 or (have been) at least 2 >> symbols >> referring to the data. NAM() increments from 1 (no duplication on modify >> required) on original creation to 2 when a = b (duplicate on modify) >> >>> b = matrix(0, 3, 2) >>> .Internal(inspect(b)) >> @7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,0,0,0,... >> ATTRIB: >> @7057528 02 LISTSXP g0c0 [] >> TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value) >> @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2 >>> b[3, 1] = 2 >>> .Internal(inspect(b)) >> @7053ec0 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,... >> ATTRIB: >> @7057528 02 LISTSXP g0c0 [] >> TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value) >> @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2 >>> a = b >>> .Internal(inspect(b)) ## data address unchanced >> @7053ec0 14 REALSXP g0c4 [NAM(2),ATT] (len=6, tl=0) 0,0,0,0,0,... >> ATTRIB: >> @7057528 02 LISTSXP g0c0 [] >> TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value) >> @7056858 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2 >>> b[3, 1] = 2 >>> .Internal(inspect(b)) ## data address changed >> @7232910 14 REALSXP g0c4 [NAM(1),ATT] (len=6, tl=0) 0,0,2,0,0,... >> ATTRIB: >> @7239d28 02 LISTSXP g0c0 [] >> TAG: @21c5fb8 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value) >> @7237b48 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 3,2 >> >> >>> >>> The following log, from R 3.0.1, does not seem to act that way; I get >>> the same amount of memory used whether I copy the same object repeatedly >>> or create new objects of the same size. >>> >>> Can anyone explain what is going on? Am I just wrong that copies are >>> initially shallow? Or perhaps that behavior only applies for function >>> arguments? Or doesn't apply for class slots or reference class >>> variables? >>> >>>> foo <- setRefClass("foo", fields=list(x="ANY")) >>>> bar <- setClass("bar", slots=c("x")) >> >> using the approach above, we can see that creating an S4 or reference object >> in >> the way you've indicated (validity checks or other initialization might >> change >> this) does not copy the data although it is marked for duplication >> >>> x = 1:2; .Internal(inspect(x)) >> @7553868 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2 >>> .Internal(inspect(foo(x=x)$x)) >> @7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >>> .Internal(inspect(bar(x=x)@x)) >> @7553868 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> >> On the other hand, lapply is creating copies >> >>> x = 1:2; .Internal(inspect(x)) >> @757b5a8 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2 >>> .Internal(inspect(lapply(1:2, function(i) x))) >> @7551f88 19 VECSXP g0c2 [] (len=2, tl=0) >> @757b428 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 >> @757b3f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 >> >> One can construct a list without copies >> >>> x = 1:2; .Internal(inspect(x)) >> @7677c18 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2 >>> .Internal(inspect(list(x)[rep(1, 2)])) >> @767b080 19 VECSXP g0c2 [NAM(2)] (len=2, tl=0) >> @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> @7677c18 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> >> but that (creating a list of identical elements) doesn't seem to be a likely >> real-world scenario and the gain is transient >> >>> x = 1:2; y = list(x)[rep(1, 4)] >>> .Internal(inspect(y)) >> @507bef8 19 VECSXP g0c3 [NAM(2)] (len=4, tl=0) >> @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >> @514ff98 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 1,2 >>> y[[1]][1] = 2L ## everybody copied >>> .Internal(inspect(y)) >> @507bf40 19 VECSXP g0c3 [NAM(1)] (len=4, tl=0) >> @51502c8 13 INTSXP g0c1 [] (len=2, tl=0) 2,2 >> @51502f8 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 >> @5150328 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 >> @5150358 13 INTSXP g0c1 [] (len=2, tl=0) 1,2 >> >> >> Probably it is more helpful to think of reducing the number of times an >> object >> is _modified_, e.g., representing data as vectors and doing vectorized >> updates. >> >> Martin >> >>>> mycoef <- list(a=matrix(rnorm(200000), ncol=2000), b=array(rnorm(200000), >>> dim=c(4, 5, 10000))) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2650747 141.6 4170209 222.8 4170209 222.8 >>> Vcells 799751724 6101.7 1711485496 13057.6 1711485493 13057.6 >>>> a <- lapply(1:100, function(i) bar(x=mycoef)) # create 100 objects that >>> contain copies >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2652156 141.7 4170209 222.8 4170209 222.8 >>> Vcells 839752640 6406.9 1711485496 13057.6 1711485493 13057.6 >>> # +305 Mb >>>> b <- lapply(1:100, function(i) foo(x=mycoef)) # same with a reference >>>> class >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2654761 141.8 4170209 222.8 4170209 222.8 >>> Vcells 879756752 6712.1 1711485496 13057.6 1711485493 13057.6 >>> # also + 305 Mb >>>> rm("a", "b") >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2650660 141.6 4170209 222.8 4170209 222.8 >>> Vcells 799751664 6101.7 1711485496 13057.6 1711485493 13057.6 >>> # write to "copy" to see if it uses more memory >>>> a <- lapply(1:100, function(i) {r <- bar(x=mycoef); r@x$a[5, 10] <- 33; r} >>>> ) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2652174 141.7 4170209 222.8 4170209 222.8 >>> Vcells 839752684 6406.9 1711485496 13057.6 1711485493 13057.6 >>> # also + 305 Mb >>>> rm("a", "b") >>> Warning message: >>> In rm("a", "b") : object 'b' not found >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2650680 141.6 4170209 222.8 4170209 222.8 >>> Vcells 799751684 6101.7 1711485496 13057.6 1711485493 13057.6 >>> # now create completely distinct objects >>>> a <- lapply(1:100, function(i) {acoef <- list(a=matrix(rnorm(200000), >>> ncol=2000), b=array(rnorm(200000), dim=c(4, 5, 10000))) >>> !+ bar(x=acoef)}) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 2652191 141.7 4170209 222.8 4170209 222.8 >>> Vcells 839752699 6406.9 1711485496 13057.6 1711485493 13057.6 >>> # + 305 Mb >>> >>> Thanks. >>> Ross Boylan >>> >>> P.S. I also tried posting this from a google-managed email account, and >>> have got >>> back two messages like this: >>> Mail Delivery Subsystem mailer-dae...@googlemail.com >>> >>> >>> 5:22 PM (28 minutes ago) >>> >>> >>> to me >>> >>> This is an automatically generated Delivery Status Notification >>> >>> THIS IS A WARNING MESSAGE ONLY. >>> >>> YOU DO NOT NEED TO RESEND YOUR MESSAGE. >>> >>> Delivery to the following recipient has been delayed: >>> >>> r-h...@r.project.org <mailto:r-h...@r.project.org> >>> >>> Message will be retried for 1 more day(s) >>> >>> Technical details of temporary failure: >>> The recipient server did not accept our requests to connect. Learn more at >>> http://support.google.com/mail/bin/answer.py?answer=7720 >>> <http://support.google.com/mail/bin/answer.py?answer=7720> >>> [(0) r.project.org <http://r.project.org> >>> . [206.188.192.100]:25: Connection refused] >>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.