On Tue, 13 Sep 2005, Jari Oksanen wrote: > On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote: > > I'm working on a project related to document clustering. I know that R > > has clustering algorithms such as clara, but only supports two distance > > metrics: euclidian and manhattan, which are not very useful for > > clustering documents. I was wondering how easy it would be to extend the > > clustering package in R to support other distance metrics, such as > > cosine distance, or if there was an API for custom distance metrics. > > > You don't have to extend the "clustering package in R to support other > distance metrics", but you should take care that you produce your > dissimilarities (or distances) in the standard format so that they can > be used in "clustering package" or in cmdscale or in isoMDS or any other > function excepting a "dist" object. "Clustering package" will support > new dissimilarities if they were written in standard conforming way. > There are several packages that offer alternative dissimilarities (and > some even distances) that can be used in clustering functions. Look for > "distances" or "dissimilarities" in the R Site. Some of these could be > the one for you... I would be surprised if cosine index is missing (and > if needed, I could write it for you in C, but I don't think that is > necessary).
Generation of the standard dist format out of a distance matrix m works simply by as.dist(m). Christian *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 [EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
