Please find attached a small patch to improve the performance of as.matrix.dist().  It's a tiny bit more involved than the current code but does bring a reasonable speed improvement for larger <dist> objects (remaining comparable for smaller ones).

Example:

set.seed(1)
dat <- matrix(rnorm(20000), ncol = 2);
system.time(as.matrix(dist(dat)))

As of r84931:

   user  system elapsed
  3.370   1.154   4.535

With this patch:

   user  system elapsed
  1.925   0.754   2.685

Submitting here in the first instance but happy to move to Bugzilla if more appropriate.

Cheers

Tim
Index: src/library/stats/R/dist.R
===================================================================
--- src/library/stats/R/dist.R	(revision 84931)
+++ src/library/stats/R/dist.R	(working copy)
@@ -49,10 +49,13 @@
 {
     size <- attr(x, "Size")
     df <- matrix(0, size, size)
-    lower <- row(df) > col(df)
+    idx <- seq_len(size)
+    d1 <- unlist(lapply(idx[-1L], seq.int, to = size, by = 1L))
+    d2 <- rep.int(idx[-size], times = rev(idx[-size]))
+    lower <- cbind(d1,d2)
+    upper <- cbind(d2,d1)
     df[lower] <- x ## preserving NAs in x
-    df <- t(df)
-    df[lower] <- x
+    df[upper] <- x
     labels <- attr(x, "Labels")
     dimnames(df) <-
 	if(is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels,labels)
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to