On 05/25/2012 09:02 AM, Gael Varoquaux wrote:
> Hi list,
>
> A lot of clustering algorithms can be initiated randomely and thus on the
> same data give different results because of the non-convexity of the
> criterion.
>
> One trivial source of non-reproducibility is the fact that labels can be
> permuted: even if the algorithm find the same clusters, it may give
> different labels to these. This renders testing and exploration harder,
> but its easy to fix.
>
> Indeed, if we use as a convention that as we consider training samples in
> the ordering in which they are given, cluster labels are found in an
> ordered way, all we need to do is to add the following line at the end of
> the fit:
>
>      labels = np.unique(labels, return_index=True)[1][labels]
-0
Why not, but this is easy and safe to do only in some cases:
-- do not forget to permute all the label-related info (cluster centers, 
weights, covariance)...
-- In case of hierarchical clustering, you need to decide whether you 
break the consistency of the labelling across level of the hierarchy.
>
> provided that the labels id are not used elsewhere, of course.
Yes, exactly.
> I'd like to do this for kmeans, and maybe a few other algorithms where it
> is really easy to do. What do people think?
>
> Gael
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to