Hi Anaël, Thanks for spotting these inconsistencies. You are very welcome to open pull-requests and/or issues on the GitHub tracker (cf. http://scikit-learn.org/stable/developers/contributing.html#contributing-code ) The documentation issue should be straightforward. The parameter renaming would need a proper deprecation cycle (cf http://scikit-learn.org/stable/developers/contributing.html#deprecation).
See you on GitHub, Tom 2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaug...@ssi.gouv.fr>: > Dear all, > > Three clustering algorithms can take as input distance or similarity > matrices instead of the observations (AgglomerativeClustering > <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>, > AffinityPropagation > <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>, > and DBSCAN > <http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>), > but there are inconsistencies in their documentations. > > > *DBSCAN :* > The documentation explains clearly how to run DBSCAN with a precomputed > distance matrix. > Constructor: > > * metric: If metric is “precomputed”, X is assumed to be a distance > matrix and must be square. * > fit / fit_predict > > > > *: X: A feature array, or array of distances between samples if > metric='precomputed'. * > > *AffinityPropagation : * > Constructor: > affinity: > *Which affinity to use. At the moment precomputed and euclidean are > supported. euclidean uses the negative squared euclidean distance between > points. * > fit : > * X: * > *Data matrix or, if affinity is precomputed, matrix of similarities / > affinities. * > fit_predict : > * X: Input data. * > X can also be a matrix of similarities ? fit and fit_predict > should share the same documentation for the input X ? > > > > *AgglomerativeClustering : * Constructor: > *affinity: Metric used to compute the linkage. Can be > “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If > linkage is “ward”, only “euclidean” is accepted*. > The name of the parameter 'affinity' seems misleading, since it > does not correspond to similarity functions, but to distance functions. > fit : > * X: **The samples a.k.a. observations.* > fit_predict : > * X: * > *Input data. * The documentation of fit and fit_predict does not > specify that X can also be a matrix of distances. > > The user may be confused whether he/she should provide a distance or a > similarity matrix to AgglomerativeClustering. > The documentation of fit and fit_predict can be easily updated. As for the > name of the 'affinity' parameter, it is more difficult since it involves an > API change. > > > What do you think of these potential updates of the documentation ? > > Cheers, > > Anaël Beaugnon > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn