Re: [R] Lookups in R

jim holtman Thu, 05 Jul 2007 04:51:10 -0700

You are getting two very different results in what you are comparing.

> system.time(lapply(1:10^4, mean))
  user  system elapsed
  1.31    0.00    1.31
is returning a list with 10,000 values in it.  It is taking time to allocate
the space and such.


> system.time(for(i in 1:10^4) mean(i))
  user  system elapsed
  0.33    0.00    0.32
is just returning a single value (mean(10^4)) and is not having to allocate
space and setup the structure for a list.  Typically you use 'lapply' not
only for 'looping', but more importantly returning the values associated
with the processing.

So again the timing will be dependent on what you are doing.  If you have a
large transaction table that you want consolidated to some processing on
userID, then lapply will probably be very efficient for that.


On 7/5/07, Michael Frumin <[EMAIL PROTECTED]> wrote:
>
> the problem I have is that userid's are not just sequential from
> 1:n_users.  if they were, of course I'd have made a big matrix that was
> n_users x n_fields and that would be that.  but, I think what I cando is
> just use the hash to store the index into the result matrix, nothing
> more. then the rest of it will be easy.
>
> but please tell me more about eliminating loops.  In many cases in R I
> have used lapply and derivatives to avoid loops, but in this case they
> seem to give me extra overhead simply by the generation of their result
> lists:
>
> > system.time(lapply(1:10^4, mean))
>   user  system elapsed
>   1.31    0.00    1.31
> > system.time(for(i in 1:10^4) mean(i))
>   user  system elapsed
>   0.33    0.00    0.32
>
>
> thanks,
> mike
>
>
> > I don't think that's a fair comparison--- much of the overhead comes
> > from the use of data frames and the creation of the indexing vector. I
> > get
> >
> > > n_accts <- 10^3
> > > n_trans <- 10^4
> > > t <- list()
> > > t$amt <- runif(n_trans)
> > > t$acct <- as.character(round(runif(n_trans, 1, n_accts)))
> > > uhash <- new.env(hash=TRUE, parent=emptyenv(), size=n_accts)
> > > for (acct in as.character(1:n_accts)) uhash[[acct]] <- list(amt=0,
> n=0)
> > > system.time(for (i in seq_along(t$amt)) {
> > +     acct <- t$acct[i]
> > +     x <- uhash[[acct]]
> > +     uhash[[acct]] <- list(amt=x$amt + t$amt[i], n=x$n + 1)
> > + }, gcFirst = TRUE)
> >    user  system elapsed
> >   0.508   0.008   0.517
> > > udf <- matrix(0, nrow = n_accts, ncol = 2)
> > > rownames(udf) <- as.character(1:n_accts)
> > > colnames(udf) <- c("amt", "n")
> > > system.time(for (i in seq_along(t$amt)) {
> > +     idx <- t$acct[i]
> > +     udf[idx, ] <- udf[idx, ] + c(t$amt[i], 1)
> > + }, gcFirst = TRUE)
> >    user  system elapsed
> >   1.872   0.008   1.883
> >
> > The loop is still going to be the problem for realistic examples.
> >
> > -Deepayan
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Lookups in R

Reply via email to