On 3/28/2007 8:17 PM, Henrik Bengtsson wrote: > On 3/28/07, Duncan Murdoch <[EMAIL PROTECTED]> wrote: >> On 3/28/2007 5:25 PM, Henrik Bengtsson wrote: >>> Hi, >>> >>> when doing as.double() on an object that is already a double, the >>> object seems to be copied internally, doubling the memory requirement. >>> See example below. Same for as.character() etc. Is this intended? >>> >>> Example: >>> >>> % R --vanilla >>>> x <- double(1e7) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 234019 6.3 467875 12.5 350000 9.4 >>> Vcells 10103774 77.1 11476770 87.6 10104223 77.1 >>>> x <- as.double(x) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 234113 6.3 467875 12.5 350000 9.4 >>> Vcells 10103790 77.1 21354156 163.0 20103818 153.4 >>> >>> However, couldn't this easily be avoided by letting as.double() return >>> the object as is if already a double? >> as.double calls the internal as.vector, which also strips off >> attributes. But in the case where the output is identical to the input, >> this does seem like an easy optimization. I don't know if it would help >> most people, but it might help in the kinds of cases you mention. > > What about, > > as.double.double <- function(x, ...) { > if (is.null(attributes(x))) x else NextMethod("as.double", x, ...) > } > > and same for as.integer(), as.logical(), as.complex(), as.raw(), and > as.character()?
Yes, something like that, except it should be within the internal as.vector code. Writing it in R code would impact all users, and might even negate any advantage you got from the lack of duplication. For example, you'll be duplicating the attributes of x with the code above, but internal code could do the test without the duplication. Duncan Murdoch > > /Henrik > >> Duncan Murdoch >> >>> Example: >>> >>> % R --vanilla >>>> as.double.double <- function(x, ...) x >>>> x <- double(1e7) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 234019 6.3 467875 12.5 350000 9.4 >>> Vcells 10103774 77.1 11476770 87.6 10104223 77.1 >>>> x <- as.double(x) >>>> gc() >>> used (Mb) gc trigger (Mb) max used (Mb) >>> Ncells 234028 6.3 467875 12.5 350000 9.4 >>> Vcells 10103779 77.1 12130608 92.6 10104223 77.1 >>> >>> What's the catch? >>> >>> >>> The reason why I bring it up, is because many (most?) methods are >>> using as.double() etc "just in case" when passing arguments to >>> .Call(), .Fortran() etc, e.g. stats::smooth.spline(): >>> >>> fit <- .Fortran(R_qsbart, as.double(penalty), as.double(dofoff), >>> x = as.double(xbar), y = as.double(ybar), w = as.double(wbar), >>> <etc>) >>> >>> Your memory usage is peaking in the actual call and the garbage >>> collector cannot clean it up until after the call. This seems to be >>> waste of memory, especially when the objects are large (100-1000MBs). >>> >>> Cheers >>> >>> Henrik >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel