Dear all, Three clustering algorithms can take as input distance or similarity matrices instead of the observations (AgglomerativeClustering <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>, AffinityPropagation <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>, and DBSCAN <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>), but there are inconsistencies in their documentations.
*DBSCAN :* The documentation explains clearly how to run DBSCAN with a precomputed distance matrix. Constructor:/ metric: If metric is “precomputed”, X is assumed to be a distance matrix and must be square. / fit / fit_predict /: X: A feature array, or array of distances between samples if |metric='precomputed'|. / *AffinityPropagation : * Constructor: affinity: /Which affinity to use. At the moment |precomputed| and |euclidean| are supported. |euclidean| uses the negative squared euclidean distance between points. / fit : / X: //Data matrix or, if affinity is |precomputed|, matrix of similarities / affinities. / fit_predict :/ / / X: Input data. / X can also be a matrix of similarities ? fit and fit_predict should share the same documentation for the input X ?/ / *AgglomerativeClustering : * Constructor: /affinity: Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is “ward”, only “euclidean” is accepted/. The name of the parameter 'affinity' seems misleading, since it does not correspond to similarity functions, but to distance functions. fit : / X: //The samples a.k.a. observations./ fit_predict :/ // X: //Input data. / The documentation of fit and fit_predict does not specify that X can also be a matrix of distances. The user may be confused whether he/she should provide a distance or a similarity matrix to AgglomerativeClustering. The documentation of fit and fit_predict can be easily updated. As for the name of the 'affinity' parameter, it is more difficult since it involves an API change. What do you think of these potential updates of the documentation ? Cheers, Anaël Beaugnon //
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn