Rprof() is definitely what you want to look at. Another couple pieces of
advice:

First, allocate lists/vectors all at once whenever possible. Furthermore,
even when not possible, do NOT allocate lists one position at a time inside
large loops. If you can, make a guess about how long a list you need and
allocate that, then grow it (not by one position at a time) as needed.

#Bad:
mylist <- vector("list", 1)
for(i in 1:10000)
{
  mylist[[i]] <- i #this must add the next position to the list before it is
filled
}

#Good:
mylist <- vector("list", 10000)
for(i in 1:10000)
{
  mylist[[i]] <- i #this simply fills the next position, which already
exists
}


Second, avoid lookups inside a loop/lapply whenever possible.

#Bad:
for(i in 1:4000)
{
  myobj$data[i] = myobj$fun(i)  #the $ function is called unnecessarily
twice in this line
}

#Good:
myfun <- myobj$fun
mydata <- numeric(4000)
for(i in 1:4000)
{
  mydata[i] = myfun(i)  #the $ function is not called at all.
}
myobj$data <- mydata

Hope those help,
Gabe

On Mon, Jul 6, 2009 at 1:42 PM, Roger Peng <rdp...@gmail.com> wrote:

> My advice would be to use the profiler 'Rprof()' --- you may find that
> the loop is not really the problem. In my experience, there's
> relatively little difference between 'lapply' and a 'for' loop,
> although 'lapply' can be faster at times.
>
> -roger
>
> On Mon, Jul 6, 2009 at 4:26 AM, Thorn Thaler<thot...@sbox.tugraz.at>
> wrote:
> > High everybody,
> >
> > currently I'm writinig a package that, for a given family of variance
> > functions depending on a parameter theta, say, computes the extended
> quasi
> > likelihood (eql) function for different values of theta.
> >
> > The computation involves a couple of calls of the 'glm' routine. What I'm
> > doing now is to call 'lapply' for a list of theta values and a function,
> > that constructs a family object for the particular choice of theta,
> computes
> > the glm and uses the results to get the eql. Not surprisingly the
> function
> > is not very fast. Depending on the size of the parameter space under
> > consideration it takes a couple of minutes until the function finishes.
> > Testing ~1000 Parameters takes about 5 minutes on my machine.
> >
> > I know that loops in R are slow more often than not. Thus, I thought
> using
> > 'lapply' is a better way. But anyways, it is just another way of a loop.
> > Besides, it involves some overhead for the function call and hence i'm
> not
> > sure wheter using 'lapply' is really the better choice.
> >
> > What I like to know is to figure out, where the bottleneck lies.
> > Vectorization would help, but since I don't think that there is
> vectorized
> > 'glm' function, which is able to handle a vector of family objects. I'm
> not
> > aware if there is any choice aside from using a loop.
> >
> > So my questions:
> > - how can I figure out where the bottleneck lies?
> > - is 'lapply' always superior to a loop in terms of execution time?
> > - are there any 'evil' commands that should be avoided in a loop, for
> they
> > slow down the computation?
> > - are there any good books, tutorials about how to profile R code
> > efficiently?
> >
> > TIA 4 ur help,
> >
> > Thorn
> >
> > ______________________________________________
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>
>
> --
> Roger D. Peng  |  
> http://www.biostat.jhsph.edu/~rpeng/<http://www.biostat.jhsph.edu/%7Erpeng/>
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to