Based on a discussion on SO I ran some tests and found that converting to a factor is best done early in the process. Hence, I propose to rewrite the gl() function as :
gl2 <- function(n, k, length = n * k, labels = 1:n, ordered = FALSE){ rep( rep( factor(1:n,levels=1:n,labels=labels, ordered=ordered),rep.int(k,n) ),length.out=length ) } Some test results : > system.time(X1 <- gl(5,1e7)) user system elapsed 29.21 0.30 29.58 > system.time(X2 <- gl2(5,1e7)) user system elapsed 1.87 0.45 2.37 > all.equal(X1,X2) [1] TRUE > system.time(X1 <- gl(5,100,1e7)) user system elapsed 5.98 0.05 6.05 > system.time(X2 <- gl2(5,100,1e7)) user system elapsed 0.21 0.03 0.25 > all.equal(X1,X2) [1] TRUE > system.time(X1 <- gl(5,100,1e7,labels=letters[1:5])) user system elapsed 5.88 0.02 5.98 > system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5])) user system elapsed 0.20 0.05 0.25 > all.equal(X1,X2) [1] TRUE > system.time(X1 <- gl(5,100,1e7,labels=letters[1:5],ordered=T)) user system elapsed 5.82 0.03 5.89 > system.time(X2 <- gl2(5,100,1e7,labels=letters[1:5],ordered=T)) user system elapsed 0.22 0.04 0.25 > all.equal(X1,X2) [1] TRUE reference to SO : http://stackoverflow.com/questions/5627264/how-can-i-efficiently-construct-a-very-long-factor-with-few-levels -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Applied mathematics, biometrics and process control tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel