Several of the methods I use for analyzing large data sets, such as

 WinGamma: determining the level of noise in data
 Relief-F: estimating the influence of variables

depend on finding the k nearest neighbors of a point in a data frame or
matrix efficiently. (For large data sets it is not feasible to compute
the 'dist' matrix anyway.)

Seeing the proposed solution to "[R] distance between two matrices"
last month for finding _one_ nearest neighbor I came up with a solution
'nearest(A, n, k)' as appended.

Still, this is very clumsy and slow if you have to find the 3 nearest
neighbors for 1000 points in a data frame with 100000 entries at least
-- about 2 secs per data point on my computer or half an hour for an
application from real life.

Are there better/faster ways to perform this task using R functions?
Even better, is there a free implementation of kd-trees that I could
utilize (the one I found did not conform to the ANSI C standard)?

Someome pointed to 'spdep' of the R-Sig-Geo project, but 'knearneigh'
only accepts 2D data points.

Hans Werner Borchers
ABB Corporate Research, Germany
________________________________________________________________________

require (class)
nearest <- function (X, n, k=3)
##  Find k nearest neighbors of X[n, ] in the data frame
##  or matrix X, utilizing function knn1 k-times.
{
   N <- nrow(X)
   # inds contains the indices of nearest neighbors
   inds <- c(n); i <- 0
   while (i < k) {
       # use knn1 for one index...
       j  <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
       # ...and change to true index of neighbor
       inds <- c(inds, setdiff(1:N, inds)[j])
       i <- i+1
   }
   # return nearest neighbor indices (without n, of course)
   return(inds[-1])
}

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to