On 21 February 2013 22:53, nipun batra <[email protected]> wrote:
> Hi,
> Is there a way (already existing) , whereby we can get the variance about
> the mean/cluster center (chosen by clustering algorithm) of all the samples
> belonging to that cluster.
> If not, is the following approach optimal?
> For each cluster label:
> For all ids belonging to this cluster label:
> Find variance from Sequence[ids] about mean/cluster center
>
> Also, what is the best way to deal with outliers while performing
> clustering. I saw that DBSCAN would inherently do that.
>
>
>
> ------------------------------------------------------------------------------
> Everyone hates slow websites. So do we.
> Make your web apps faster with AppDynamics
> Download AppDynamics Lite for free today:
> http://p.sf.net/sfu/appdyn_d2d_feb
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Hello there,
Have a look at the code for sklearn/metrics/cluster/unsupervised.py and
specifically the _intra_cluster_distance function.
It takes the distance from an instance to all other instances, and
calculates the mean distance to other instances in the same cluster.
This isn't quite what you want, but may help you write some code that does
it (try the same code, changing np.mean with np.var).
There isn't a generic function in sklearn to do this because there isn't a
generic method for doing this. For k-means, there is a centroid for each
cluster, making it easy to calculate. For other methods, there isn't always
this option. Any generic method we implement would be a hack, which goes
against one of the principles of this project (only proven methods).
As for the outliers, you are right, DBSCAN does this. it finds outliers
based on the parameters you give it.
You may have some luck with running the other outlier detection methods
first, which are a little more automated. See here:
http://scikit-learn.org/stable/modules/outlier_detection.html
Hope that helps,
Robert
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general