+1 for a PR on fit_predict docs. This is probably due to the inheritance
structure.
Though it's weird that DBSCAN has the correct docs.
I'm not sure about renaming affinity, but we can discuss that. I agree
it's misleading.
On 5/23/18 8:01 AM, Tom DLT wrote:
Hi Anaël,
Thanks for spotting these inconsistencies.
You are very welcome to open pull-requests and/or issues on the GitHub
tracker (cf.
http://scikit-learn.org/stable/developers/contributing.html#contributing-code)
The documentation issue should be straightforward.
The parameter renaming would need a proper deprecation cycle (cf
http://scikit-learn.org/stable/developers/contributing.html#deprecation).
See you on GitHub,
Tom
2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaug...@ssi.gouv.fr
<mailto:anael.beaug...@ssi.gouv.fr>>:
Dear all,
Three clustering algorithms can take as input distance or
similarity matrices instead of the observations
(AgglomerativeClustering
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
AffinityPropagation
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
and DBSCAN
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
but there are inconsistencies in their documentations.
*DBSCAN :*
The documentation explains clearly how to run DBSCAN with a
precomputed distance matrix.
Constructor:/
metric: If metric is “precomputed”, X is assumed to be a
distance matrix and must be square.
/
fit / fit_predict /:
X: A feature array, or array of distances between samples
if |metric='precomputed'|.
/
*AffinityPropagation :
*
Constructor:
affinity: /Which affinity to use. At the moment
|precomputed| and |euclidean| are supported. |euclidean| uses the
negative squared euclidean distance between points.
/
fit : /
X: //Data matrix or, if affinity is |precomputed|, matrix
of similarities / affinities.
/
fit_predict :/
/
/ X: Input data. /
X can also be a matrix of similarities ? fit and
fit_predict should share the same documentation for the input X ?/
/
*AgglomerativeClustering :
* Constructor:
/affinity: Metric used to compute the linkage. Can be “euclidean”,
“l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is
“ward”, only “euclidean” is accepted/.
The name of the parameter 'affinity' seems misleading, since it
does not correspond to similarity functions, but to distance
functions.
fit : /
X: //The samples a.k.a. observations./
fit_predict :/
// X: //Input data.
/The documentation of fit and fit_predict does not specify that X
can also be a matrix of distances.
The user may be confused whether he/she should provide a distance
or a similarity matrix to AgglomerativeClustering.
The documentation of fit and fit_predict can be easily updated. As
for the name of the 'affinity' parameter, it is more difficult
since it involves an API change.
What do you think of these potential updates of the documentation ?
Cheers,
Anaël Beaugnon
//
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn