Rprof() is definitely what you want to look at. Another couple pieces of advice:
First, allocate lists/vectors all at once whenever possible. Furthermore, even when not possible, do NOT allocate lists one position at a time inside large loops. If you can, make a guess about how long a list you need and allocate that, then grow it (not by one position at a time) as needed. #Bad: mylist <- vector("list", 1) for(i in 1:10000) { mylist[[i]] <- i #this must add the next position to the list before it is filled } #Good: mylist <- vector("list", 10000) for(i in 1:10000) { mylist[[i]] <- i #this simply fills the next position, which already exists } Second, avoid lookups inside a loop/lapply whenever possible. #Bad: for(i in 1:4000) { myobj$data[i] = myobj$fun(i) #the $ function is called unnecessarily twice in this line } #Good: myfun <- myobj$fun mydata <- numeric(4000) for(i in 1:4000) { mydata[i] = myfun(i) #the $ function is not called at all. } myobj$data <- mydata Hope those help, Gabe On Mon, Jul 6, 2009 at 1:42 PM, Roger Peng <rdp...@gmail.com> wrote: > My advice would be to use the profiler 'Rprof()' --- you may find that > the loop is not really the problem. In my experience, there's > relatively little difference between 'lapply' and a 'for' loop, > although 'lapply' can be faster at times. > > -roger > > On Mon, Jul 6, 2009 at 4:26 AM, Thorn Thaler<thot...@sbox.tugraz.at> > wrote: > > High everybody, > > > > currently I'm writinig a package that, for a given family of variance > > functions depending on a parameter theta, say, computes the extended > quasi > > likelihood (eql) function for different values of theta. > > > > The computation involves a couple of calls of the 'glm' routine. What I'm > > doing now is to call 'lapply' for a list of theta values and a function, > > that constructs a family object for the particular choice of theta, > computes > > the glm and uses the results to get the eql. Not surprisingly the > function > > is not very fast. Depending on the size of the parameter space under > > consideration it takes a couple of minutes until the function finishes. > > Testing ~1000 Parameters takes about 5 minutes on my machine. > > > > I know that loops in R are slow more often than not. Thus, I thought > using > > 'lapply' is a better way. But anyways, it is just another way of a loop. > > Besides, it involves some overhead for the function call and hence i'm > not > > sure wheter using 'lapply' is really the better choice. > > > > What I like to know is to figure out, where the bottleneck lies. > > Vectorization would help, but since I don't think that there is > vectorized > > 'glm' function, which is able to handle a vector of family objects. I'm > not > > aware if there is any choice aside from using a loop. > > > > So my questions: > > - how can I figure out where the bottleneck lies? > > - is 'lapply' always superior to a loop in terms of execution time? > > - are there any 'evil' commands that should be avoided in a loop, for > they > > slow down the computation? > > - are there any good books, tutorials about how to profile R code > > efficiently? > > > > TIA 4 ur help, > > > > Thorn > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Roger D. Peng | > http://www.biostat.jhsph.edu/~rpeng/<http://www.biostat.jhsph.edu/%7Erpeng/> > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel