Niels Richard Hansen wrote:
Consider the following little "benchmark"

 > require(Matrix)
 > tmp <- Matrix(c(rep(1,1000),rep(0,9000)),ncol=1)
 > ind <- sample(1:10000,10000)
 > system.time(tmp[ind,])
   user  system elapsed
  0.004   0.001   0.005

 > ind <- sample(1:1000,10000,replace=TRUE)
 > system.time(tmp[ind,])
   user  system elapsed
  0.654   0.006   0.703

 > system.time(Matrix(as(tmp,"matrix")[ind,]))
   user  system elapsed
  0.005   0.000   0.006

First I access all 10000 rows in a random order, which is fast,
but when I access the first 1000 rows 10000 times there is a
considerable slowdown. Last I convert back and forth
between matrix and Matrix and get a serious speedup. Am I missing
a point here? Should I not use indexing with "[" for the
sparse matrices if I have repeated indices?

I'm running Mac OS X, version 10.5.6, with Matrix package
version 0.999375-21.

I hope that somebody can enlighten me on this issue.

Thanks, Niels

The sources have the answer, but I'm as reluctant to read them as you are. ;-)

The repeated indices are certainly an important part of it. Notice also that you'll have timings like

> ind <- sample(1:10000,10000,replace=TRUE)
> system.time(tmp[ind,])
   user  system elapsed
  0.884   0.000   1.302
> ind <- sample(1:1000,10000,replace=TRUE)
> system.time(tmp[ind,])
   user  system elapsed
  2.053   0.009   2.268
> ind <- sample(1:10000,10000,replace=FALSE)
> system.time(tmp[ind,])
   user  system elapsed
   0.01    0.00    0.01

It is, however, apparently unrelated to the sparseness of the result (sampling from 1001:2000 gives the same result).

Also
> ind <- sample(1:5000,5000,replace=FALSE)
> ind <- c(ind,ind)
> system.time(tmp[ind,])
   user  system elapsed
  1.204   0.001   1.331

has a considerable part of the slowdown, as does

ind <- c(1:5000,1:5000)

Presumably the issue is that calculations on sparseness patterns are harder when there are repeated indices.

--
   O__  ---- Peter Dalgaard             Ă˜ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk)              FAX: (+45) 35327907

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to