Dear All, a long time ago I ran a cluster analysis where the dissimilarity matrix used consisted of Dmax (or Kolmogorov-Smirnov distance) values. In other words the maximum difference between two cumulative proportion curves. This all worked very well indeed. The matrix was calculated using Dbase III+ and took a day and a half and the clustering was done using MV-ARCH, with the resultant dendrogram converted from HP Plotter language to PostScript manually. As you might guess, I'd like to be able to do this more efficiently in R.
I have looked through the various help files and found that some of the clustering routines will take a dissimilarity matrix as input (yay!). My questions (as a very novice R user) are: a) how would one go about calculating the matrix of Dmax/KS distance values? b) of the many clustering packages (I'll be doing a simple average link hierarchical clustering) is there one where I can ask: "If I 'cut' the dendrogram at the 0.x dissimilarity level, which items are in which clusters?" (As my dataset has over 200 items this is non-trivial to work out manually). Many thanks indeed for your help. Kris Lockyear. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.