This sounds like it may be a problem more amenable to either DBSCAN or
OPTICS. Both algorithms don't require a priori knowledge of the number
of clusters, and both let you specify a minimum point membership
threshold for cluster membership. The OPTICS algorithm will also produce
a dendrogram that you can cut for sub clusters if need be.
DBSCAN is part of the stable release and has been for some time; OPTICS
is pending as a pull request, but it's stable and you can try it if you
like:
https://github.com/scikit-learn/scikit-learn/pull/1984
Cheers,
Shane
On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of
clusters before hand. But I want that every cluster has at least 40 data
points in it. How can I apply this to sklearn.agglomerative clustering?
Should I use dendrogram and cut it somehow? I have no idea how to relate
dendrogram to this and cutting it out. Any help will be appreciated!
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
--
*PhD candidate & Research Assistant*
*Cooperative Institute for Research in Environmental Sciences (CIRES)*
*University of Colorado at Boulder*
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn