This sounds like it may be a problem more amenable to either DBSCAN or OPTICS. Both algorithms don't require a priori knowledge of the number of clusters, and both let you specify a minimum point membership threshold for cluster membership. The OPTICS algorithm will also produce a dendrogram that you can cut for sub clusters if need be.

DBSCAN is part of the stable release and has been for some time; OPTICS is pending as a pull request, but it's stable and you can try it if you like:

https://github.com/scikit-learn/scikit-learn/pull/1984

Cheers,
Shane

On 06/30, Ariani A wrote:
I want to perform agglomerative clustering, but I have no idea of number of
clusters before hand. But I want that every cluster has at least 40 data
points in it. How can I apply this to sklearn.agglomerative clustering?
Should I use dendrogram and cut it somehow? I have no idea how to relate
dendrogram to this and cutting it out. Any help will be appreciated!

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


--
*PhD candidate & Research Assistant*
*Cooperative Institute for Research in Environmental Sciences (CIRES)*
*University of Colorado at Boulder*
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to