On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote: > I'm working on a project related to document clustering. I know that R > has clustering algorithms such as clara, but only supports two distance > metrics: euclidian and manhattan, which are not very useful for > clustering documents. I was wondering how easy it would be to extend the > clustering package in R to support other distance metrics, such as > cosine distance, or if there was an API for custom distance metrics. > You don't have to extend the "clustering package in R to support other distance metrics", but you should take care that you produce your dissimilarities (or distances) in the standard format so that they can be used in "clustering package" or in cmdscale or in isoMDS or any other function excepting a "dist" object. "Clustering package" will support new dissimilarities if they were written in standard conforming way. There are several packages that offer alternative dissimilarities (and some even distances) that can be used in clustering functions. Look for "distances" or "dissimilarities" in the R Site. Some of these could be the one for you... I would be surprised if cosine index is missing (and if needed, I could write it for you in C, but I don't think that is necessary).
cheers, jari oksanen ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html