Niels Richard Hansen wrote:
Consider the following little "benchmark"
> require(Matrix)
> tmp <- Matrix(c(rep(1,1000),rep(0,9000)),ncol=1)
> ind <- sample(1:10000,10000)
> system.time(tmp[ind,])
user system elapsed
0.004 0.001 0.005
> ind <- sample(1:1000,10000,replace=TRUE)
> system.time(tmp[ind,])
user system elapsed
0.654 0.006 0.703
> system.time(Matrix(as(tmp,"matrix")[ind,]))
user system elapsed
0.005 0.000 0.006
First I access all 10000 rows in a random order, which is fast,
but when I access the first 1000 rows 10000 times there is a
considerable slowdown. Last I convert back and forth
between matrix and Matrix and get a serious speedup. Am I missing
a point here? Should I not use indexing with "[" for the
sparse matrices if I have repeated indices?
I'm running Mac OS X, version 10.5.6, with Matrix package
version 0.999375-21.
I hope that somebody can enlighten me on this issue.
Thanks, Niels
The sources have the answer, but I'm as reluctant to read them as you
are. ;-)
The repeated indices are certainly an important part of it. Notice also
that you'll have timings like
> ind <- sample(1:10000,10000,replace=TRUE)
> system.time(tmp[ind,])
user system elapsed
0.884 0.000 1.302
> ind <- sample(1:1000,10000,replace=TRUE)
> system.time(tmp[ind,])
user system elapsed
2.053 0.009 2.268
> ind <- sample(1:10000,10000,replace=FALSE)
> system.time(tmp[ind,])
user system elapsed
0.01 0.00 0.01
It is, however, apparently unrelated to the sparseness of the result
(sampling from 1001:2000 gives the same result).
Also
> ind <- sample(1:5000,5000,replace=FALSE)
> ind <- c(ind,ind)
> system.time(tmp[ind,])
user system elapsed
1.204 0.001 1.331
has a considerable part of the slowdown, as does
ind <- c(1:5000,1:5000)
Presumably the issue is that calculations on sparseness patterns are
harder when there are repeated indices.
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalga...@biostat.ku.dk) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.