On 07/12/2010 03:00 PM, cstrato wrote: > Dear Martin, > > Thank you, you are right, now I get: > >> ann <- read.delim("Hu6800_ann.txt", stringsAsFactors=FALSE) >> object.size(ann) > 2035952 bytes >> u2p <- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) >> object.size(u2p) > 1207368 bytes >> object.size(unlist(u2p)) > 865176 bytes > > Nevertheless, a size of 1.2MB for a list representing 2 of 11 columns of
but it's a list of length(unique(ann[["UNIT_ID"]]))) elements, each of which has a pointer to the element, a pointer to the name of the element, and the element data itself. I'd guess it adds up in a non-mysterious way. For a sense of it (and maybe only understandable if you have a working understanding of how R represents data) see, e.g., > .Internal(inspect(list(x=1,y=2))) @1a4c538 19 VECSXP g0c2 [ATT] (len=2, tl=0) @191cad8 14 REALSXP g0c1 [] (len=1, tl=0) 1 @191caa8 14 REALSXP g0c1 [] (len=1, tl=0) 2 ATTRIB: @16fc8d8 02 LISTSXP g0c0 [] TAG: @60cf18 01 SYMSXP g0c0 [MARK,NAM(2),gp=0x4000] "names" @1a4c500 16 STRSXP g0c2 [] (len=2, tl=0) @674e88 09 CHARSXP g0c1 [MARK,gp=0x21] "x" @728c38 09 CHARSXP g0c1 [MARK,gp=0x21] "y" Martin > a table of size 754KB seems still to be pretty large? > > Best regards > Christian > > > On 7/12/10 11:44 PM, Martin Morgan wrote: >> On 07/12/2010 01:45 PM, cstrato wrote: >>> Dear all, >>> >>> With great interest I followed the discussion: >>> https://stat.ethz.ch/pipermail/r-devel/2010-July/057901.html >>> since I have currently a similar problem: >>> >>> In a new R session (using xterm) I am importing a simple table >>> "Hu6800_ann.txt" which has a size of 754KB only: >>> >>>> ann<- read.delim("Hu6800_ann.txt") >>>> dim(ann) >>> [1] 7129 11 >>> >>> >>> When I call "object.size(ann)" the estimated memory used to store "ann" >>> is already 2MB: >>> >>>> object.size(ann) >>> 2034784 bytes >>> >>> >>> Now I call "split()" and check the estimated memory used which turns out >>> to be 3.3GB: >>> >>>> u2p<- split(ann[,"ProbesetID"],ann[,"UNIT_ID"]) >>>> object.size(u2p) >>> 3323768120 bytes >> >> I guess things improve with stringsAsFactors=FALSE in read.delim? >> >> Martin >> >>> >>> During the R session I am running "top" in another xterm and can see >>> that the memory usage of R increases to about 550MB RSIZE. >>> >>> >>> Now I do: >>> >>>> object.size(unlist(u2p)) >>> 894056 bytes >>> >>> It takes about 3 minutes to complete this call and the memory usage of R >>> increases to about 1.3GB RSIZE. Furthermore, during evaluation of this >>> function the free RAM of my Mac decreases to less than 8MB free PhysMem, >>> until it needs to swap memory. When finished, free PhysMem is 734MB but >>> the size of R increased to 577MB RSIZE. >>> >>> Doing "split(ann[,"ProbesetID"],ann[,"UNIT_ID"],drop=TRUE)" did not >>> change the object.size, only processing was faster and it did use less >>> memory on my Mac. >>> >>> Do you have any idea what the reason for this behavior is? >>> Why is the size of list "u2p" so large? >>> Do I make any mistake? >>> >>> >>> Here is my sessionInfo on a MacBook Pro with 2GB RAM: >>> >>>> sessionInfo() >>> R version 2.11.1 (2010-05-31) >>> x86_64-apple-darwin9.8.0 >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> Best regards >>> Christian >>> _._._._._._._._._._._._._._._._._._ >>> C.h.r.i.s.t.i.a.n S.t.r.a.t.o.w.a >>> V.i.e.n.n.a A.u.s.t.r.i.a >>> e.m.a.i.l: cstrato at aon.at >>> _._._._._._._._._._._._._._._._._._ >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel