[R] How to benchmark speed of load/readRDS correctly

raphael.felber Tue, 22 Aug 2017 05:54:15 -0700

Dear all

I was thinking about efficient reading data into R and tried several ways to 
test if load(file.Rdata) or readRDS(file.rds) is faster. The files file.Rdata 
and file.rds contain the same data, the first created with save(d, ' 
file.Rdata', compress=F) and the second with saveRDS(d, ' file.rds', 
compress=F).


First I used the function microbenchmark() and was a astonished about the max 
value of the output.

FIRST TEST:
> library(microbenchmark)
> microbenchmark(
+   n <- readRDS('file.rds'),
+   load('file.Rdata')
+ )
Unit: milliseconds
              expr                     min                lq                    
   mean                    median                uq                           
max                      neval
n <- readRDS(fl1)        106.5956      109.6457         237.3844              
117.8956              141.9921              10934.162           100
         load(fl2)                  295.0654      301.8162        335.6266      
        308.3757              319.6965              1915.706              100

It looks like the max value is an outlier.

So I tried:
SECOND TEST:
> sapply(1:10, function(x) system.time(n <- readRDS('file.rds'))[3])
elapsed               elapsed               elapsed               elapsed       
        elapsed               elapsed               elapsed               
elapsed                 elapsed               elapsed
  10.50                   0.11                       0.11                       
0.11                       0.10                       0.11                      
 0.11                       0.11                       0.12                     
  0.12
> sapply(1:10, function(x) system.time(load'flie.Rdata'))[3])
elapsed               elapsed               elapsed               elapsed       
        elapsed               elapsed               elapsed               
elapsed                 elapsed               elapsed
   1.86                    0.29                       0.31                      
 0.30                       0.30                       0.31                     
  0.30                       0.29                       0.31                    
   0.30

Which confirmed my suspicion; the first time loading the data takes much longer 
than the following times. I suspect that this has something to do how the data 
is assigned and that R doesn't has to 'fully' read the data, if it is read the 
second time.

So the question remains, how can I make a realistic benchmark test? From the 
first test I would conclude that reading the *.rds file is faster. But this 
holds only for a large number of neval. If I set times = 1 then reading the 
*.Rdata would be faster (as also indicated by the second test).

Thanks for any help or comments.

Kind regards

Raphael
------------------------------------------------------------------------------------
Raphael Felber, PhD
Scientific Officer, Climate & Air Pollution

Federal Department of Economic Affairs,
Education and Research EAER
Agroscope
Research Division, Agroecology and Environment

Reckenholzstrasse 191, CH-8046 Z�rich
Phone +41 58 468 75 11
Fax     +41 58 468 72 01
raphael.fel...@agroscope.admin.ch<mailto:raphael.fel...@agroscope.admin.ch>
www.agroscope.ch<http://www.agroscope.ch/>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to benchmark speed of load/readRDS correctly

Reply via email to