I have to defer to others for policy declarations like how long the current format used by load and save should be readable.
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Heinz Tuechler [mailto:tuech...@gmx.at] > Sent: Wednesday, October 30, 2013 1:43 PM > To: William Dunlap > Cc: Carl Witthoft; r-help@r-project.org > Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? > > Best thanks for confirming my impression. I use dump for storing large > data.frames with a number of attributes for each variable. save/load is > much faster, but I am unsure, if such files will be readable by R > versions years later. > What format/functions would you suggest for data storage/transfer > between different (future) R versions? > > best regards, > Heinz > > on/am 30.10.2013 20:11, William Dunlap wrote/hat geschrieben: > > I see a big 2.15.2/3.0.2 speed difference in parse() (which is used by > > source()) > > when it is parsing long vectors of numeric data. dump/source has never > > been an > efficient > > way of transferring data between different R session, but it is much worse > > now for long vectors. In 2.15.2 doubling the size of the vector (of > > lengths > > in the range 10^4 to 10^7) makes the time to parse go up by a factor of c. > > 2.1. > > In 3.0.2 that factor is more like 4.4. > > > > n elapsed-2.15.2 elapsed-3.0.2 > > 2048 0.003 0.018 > > 4096 0.006 0.065 > > 8192 0.013 0.254 > > 16384 0.025 1.067 > > 32768 0.050 4.114 > > 65536 0.100 16.236 > > 131072 0.219 66.013 > > 262144 0.808 291.883 > > 524288 2.022 1285.265 > > 1048576 4.918 NA > > 2097152 9.857 NA > > 4194304 22.916 NA > > 8388608 49.671 NA > > 16777216 101.042 NA > > 33554432 512.719 NA > > > > I tried this with 64-bit R on a Linux box. The NA's represent sizes that > > did not > > finish while I was at a 1 1/2 hour dentist's apppointment. The timing > > function > > was: > > test <- function(n = 2^(11:25)) > > { > > tf <- tempfile() > > on.exit(unlink(tf)) > > t(sapply(n, function(n){ > > dput(log(seq_len(n)), file=tf) > > print(c(n=n, system.time(parse(file=tf))[1:3])) > > })) > > } > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf > >> Of Carl Witthoft > >> Sent: Wednesday, October 30, 2013 5:29 AM > >> To: r-help@r-project.org > >> Subject: Re: [R] big speed difference in source btw. R 2.15.2 and R 3.0.2 ? > >> > >> Did you run the identical code on the identical machine, and did you verify > >> there were no other tasks running which might have limited the RAM > >> available > >> to R? And equally important, did you run these tests in the reverse order > >> (in case R was storing large objects from the first run, thus chewing up > >> RAM)? > >> > >> > >> > >> Dear All, > >> > >> is it known that source works much faster in R 2.15.2 than in R 3.0.2 ? > >> In the example below I observe e.g. for a data.frame with 10^7 rows the > >> following timings: > >> > >> R version 2.15.2 Patched (2012-11-29 r61184) > >> length: 1e+07 > >> user system elapsed > >> 62.04 0.22 62.26 > >> > >> R version 3.0.2 Patched (2013-10-27 r64116) > >> length: 1e+07 > >> user system elapsed > >> 388.63 176.42 566.41 > >> > >> Is there a way to speed R version 3.0.2 up to the performance of R > >> version 2.15.2? > >> > >> best regards, > >> > >> Heinz Tüchler > >> > >> > >> example: > >> sessionInfo() > >> sample.vec <- > >> c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', > >> 'the', > >> 'named', 'file', 'or', 'URL', 'or', 'connection') > >> dmp.size <- c(10^(1:7)) > >> set.seed(37) > >> > >> for(i in dmp.size) { > >> df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > >> dump('df0', file='testdump') > >> cat('length:', i, '\n') > >> print(system.time(source('testdump', keep.source = FALSE, > >> encoding=''))) > >> } > >> > >> output for R version 2.15.2 Patched (2012-11-29 r61184): > >>> sessionInfo() > >> R version 2.15.2 Patched (2012-11-29 r61184) > >> Platform: x86_64-w64-mingw32/x64 (64-bit) > >> > >> locale: > >> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 > >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > >> [5] LC_TIME=German_Switzerland.1252 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >>> sample.vec <- > >> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', > >> 'the', > >> + 'named', 'file', 'or', 'URL', 'or', 'connection') > >>> dmp.size <- c(10^(1:7)) > >>> set.seed(37) > >>> > >>> for(i in dmp.size) { > >> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > >> + dump('df0', file='testdump') > >> + cat('length:', i, '\n') > >> + print(system.time(source('testdump', keep.source = FALSE, > >> + encoding=''))) > >> + } > >> length: 10 > >> user system elapsed > >> 0 0 0 > >> length: 100 > >> user system elapsed > >> 0 0 0 > >> length: 1000 > >> user system elapsed > >> 0 0 0 > >> length: 10000 > >> user system elapsed > >> 0.02 0.00 0.01 > >> length: 1e+05 > >> user system elapsed > >> 0.21 0.00 0.20 > >> length: 1e+06 > >> user system elapsed > >> 4.47 0.04 4.51 > >> length: 1e+07 > >> user system elapsed > >> 62.04 0.22 62.26 > >>> > >> > >> > >> output for R version 3.0.2 Patched (2013-10-27 r64116): > >>> sessionInfo() > >> R version 3.0.2 Patched (2013-10-27 r64116) > >> Platform: x86_64-w64-mingw32/x64 (64-bit) > >> > >> locale: > >> [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 > >> [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C > >> [5] LC_TIME=German_Switzerland.1252 > >> > >> attached base packages: > >> [1] stats graphics grDevices utils datasets methods base > >>> sample.vec <- > >> + c('source', 'causes', 'R', 'to', 'accept', 'its', 'input', 'from', > >> 'the', > >> + 'named', 'file', 'or', 'URL', 'or', 'connection') > >>> dmp.size <- c(10^(1:7)) > >>> set.seed(37) > >>> > >>> for(i in dmp.size) { > >> + df0 <- data.frame(x=sample(sample.vec, i, replace=TRUE)) > >> + dump('df0', file='testdump') > >> + cat('length:', i, '\n') > >> + print(system.time(source('testdump', keep.source = FALSE, > >> + encoding=''))) > >> + } > >> length: 10 > >> user system elapsed > >> 0 0 0 > >> length: 100 > >> user system elapsed > >> 0 0 0 > >> length: 1000 > >> user system elapsed > >> 0 0 0 > >> length: 10000 > >> user system elapsed > >> 0.01 0.00 0.01 > >> length: 1e+05 > >> user system elapsed > >> 0.36 0.06 0.42 > >> length: 1e+06 > >> user system elapsed > >> 6.02 1.86 7.88 > >> length: 1e+07 > >> user system elapsed > >> 388.63 176.42 566.41 > >>> > >> > >> > >> > >> > >> > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/big-speed-difference- > in- > >> source-btw-R-2-15-2-and-R-3-0-2-tp4679314p4679346.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.