You are discovering part of the overhead of using a data frame. The way you specify the subset of data frame to replace matters somewhat:
> st <- rep(1,1e4) > ed <- rep(2,1e4) > df <- data.frame(start=st, end=ed) > system.time(for (i in 1:dim(df)[1]) df[i,1] <- df[i,2], gcFirst=TRUE) [1] 35.96 0.10 36.37 NA NA > df <- data.frame(start=st, end=ed) > system.time(for (i in 1:dim(df)[1]) df[[1]][i] <- df[[2]][i], gcFirst=TRUE) [1] 22.63 0.17 22.88 NA NA > df <- data.frame(start=st, end=ed) > system.time(for (i in 1:dim(df)[1]) df$start[i] <- df$end[i], gcFirst=TRUE) [1] 19.29 0.13 19.46 NA NA If you have all numeric data, you might as well use a matrix instead of data frame: > m <- cbind(start=st, end=ed) > str(m) num [1:10000, 1:2] 2 2 2 2 2 2 2 2 2 2 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "start" "end" > system.time(for (i in 1:nrow(df)) m[i,1] <- m[i,2], gcFirst=TRUE) [1] 0.06 0.00 0.08 NA NA Andy > From: Firas Swidan > > Hi, > I am experiencing a long delay when using dataframes inside > loops and was > wordering if this is a bug or not. > Example code: > > > st <- rep(1,100000) > > ed <- rep(2,100000) > > for(i in 1:length(st)) st[i] <- ed[i] # works fine > > df <- data.frame(start=st,end=ed) > > for(i in 1:dim(df)[1]) df[i,1] <- df[i,2] #takes for ever > > R: R 2.0.0 (2004-10-04) > OS: Linux, Fedora Core 2 > kernel: 2.6.10-1.14_FC2 > cpu: AMD Athlon XP 1600. > mem: 500MB. > > The example above is only to illustrate the problem. I need > loops to apply > some functions on pairs (not necessarily successive) of rows in a > dataframe. > > Thankful for any advices, > Firas. > > ______________________________________________ > [email protected] mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
