New to R and having issues with loops. I am aware that I should use vectorization whenever possible and use the apply functions, however, sometimes a loop seems necessary.
I have a data set of 2 million rows and have tried run a couple of loops of varying complexity to test efficiency. If I do a very simple loop such as add every item in a column I get an answer quickly. If I use a nested ifelse statement in a loop it takes me 13 minutes to get an answer on just 50,000 rows. I am aware of a few methods to speed up loops. Preallocating memory space and compute as much outside of the loop as possible (or use create functions and just loop over the function) but it seems that even with these speed ups I might have too much data to run loops. Here is the loop I ran that took 13 minutes. I realize I can accomplish the same goal using vectorization (and in fact did so). y<-numeric(length(x)) for(i in 1:length(x)) ifelse(!is.na(x[i]), y[i]<-x[i], ifelse(strataID[i+1]==strataID[i], y<-x[i+1], y<-x[i-1])) Presumably, complicated loops would be more intensive than the nested if statement above. If I write more efficient loops time will come down but I wonder if I will ever be able to write efficient enough code to perform a complicated loop over 2 million rows in a reasonable time. Is it useless for me to try to do any complicated loops on 2 million rows, or if I get much better at programming in R will it be manageable even for complicated situations? Jay [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.