On Thu, May 24, 2012 at 9:31 AM, <r-help.20.tre...@spamgourmet.com> wrote: > Dear R-Help, > > I have a clustering problem with hclust that I hope someone can help > me with. Consider the classic hclust example: > > hc <- hclust(dist(USArrests), "ave") > plot(hc) > > I would like to cut the tree up in such a way so as to avoid small > clusters, so that we get a minimum number of items in each cluster, > and therefore avoid singletons. e.g. in this example, you can see that > Hawaii is split off onto its own at quite a high level. I would like > to avoid having a single item clustered on its own like this. How can > I achieve this? > > I have tried manually modifying the tree using dendrapply but have not > been able to produce a valid solution thus far.. > > Suggestions are welcome. > > Best wishes, > > Mark
Hi Mark, I'm not sure how you want to handle the singletons if you don't want them in a separate cluster. The package WGCNA (I'm the maintainer) and its dependency dynamicTreeCut contain a few ways of avoiding singletons as separate clusters. One way is to remove them from the resulting clusters. To this end, use function cutreeStatic, specify the cut height and the minimum number of elements in the cluster. For example, clusters1 = cutreeStatic(hc, cutHeight = 35, minSize = 3); This way all branches that have size below 3 are labeled 0. To see what you get, use the function plotDendroAndColors like this: plotDendroAndColors(hc, clusters1, rowText = clusters1 ); Each color corresponds to a cluster, and the cluster label is shown by the numbers (each number is at the start of the corresponding cluster). If you'd like to assign everything but want to avoid cluster that are too small, use the dynamic tree cut approach (http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/BranchCutting/). For example: clusters2 = cutreeDynamic(hc, distM = as.matrix(dist(USArrests)), minClusterSize = 3, deepSplit = 2) To show the clusters: plotDendroAndColors(hc, clusters2, rowText = clusters2 ); If you think the clusters are too big, try setting deepSplit=3 in the cutreeDynamic call. The dynamic tree cut basically assigns all singletons and branches with size less than minClusterSize to the nearest existing cluster (notice Hawai and the Florida/North Carolina branch), thus basically combining hierarchical clustering and a PAM-like step Whether that's a good approach for your research goal is a question you need to answer. HTH, Peter > > __________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.