Hello,

I know classic methods for clustering, I do not know specific methods for
bi-clustering. Nevertheless, they can be applied to bi-clustering. Brief
descriptions:
- Elbow methods. You expect the distortion (mean distance of a point to its
cluster's center) to decrease a lot while k < optimal k, and to decrease
very slowly for k > optimal k.
- Stability. You run your algorithm on 80% of data several time. The
"optimal" number of cluster is the one where the algorithm's result is the
most stable ---stable defined as, for runs of the algorithm on subset A and
subset B, the similarity of the results on subset A interset subsect B---
("A stability based method fordiscovering structure in clustered data", A
Ben-Hur <https://scholar.google.fr/citations?user=I1fl4oAAAAAJ&hl=fr&oi=sra>,
A Elisseeff, I Guyon
<https://scholar.google.fr/citations?user=6n-zAFEAAAAJ&hl=fr&oi=sra>)
- Gap statistic. You compare the clustering results on real data versus
clustering on a random dataset. With "optimal" k, clusters on the real
dataset should have much lower distortion than on the random dataset.
("Estimating the number of clusters in a data set via the gap statistic",
Tibshirani).

Those methods are not available in scikit-learn at the moment. I made a PR
(with examples, which may be simpler to understand) (
https://github.com/scikit-learn/scikit-learn/pull/4301)

For bi-clustering, if you define well distance or distortion (what does it
mean that my points are close), it should work well.

Best,

Arnaud

2015-08-10 17:30 GMT+02:00 Sheila the angel <from.d.pu...@gmail.com>:

> How do one finds optimal number of bi-clusters in a dataset?
> In the example
>
> http://scikit-learn.org/stable/auto_examples/bicluster/plot_spectral_biclustering.html
>
> the function "consensus_score" computes the score against the known
> data-set.
> However in real situation this is not known.
>
> What are the options for optimizing number of biclusters?
>
> Best,
> --
> Sheila
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to