Re: [scikit-learn] Inconsistencies in clustering documentations

Andreas Mueller Wed, 23 May 2018 09:12:07 -0700

+1 for a PR on fit_predict docs. This is probably due to the inheritancestructure.

Though it's weird that DBSCAN has the correct docs.

I'm not sure about renaming affinity, but we can discuss that. I agreeit's misleading.



On 5/23/18 8:01 AM, Tom DLT wrote:

Hi Anaël,

Thanks for spotting these inconsistencies.

You are very welcome to open pull-requests and/or issues on the GitHubtracker (cf.http://scikit-learn.org/stable/developers/contributing.html#contributing-code)

The documentation issue should be straightforward.

The parameter renaming would need a proper deprecation cycle (cfhttp://scikit-learn.org/stable/developers/contributing.html#deprecation).


See you on GitHub,

Tom

2018-05-23 11:50 GMT+02:00 Beaugnon Anael <anael.beaug...@ssi.gouv.fr<mailto:anael.beaug...@ssi.gouv.fr>>:


    Dear all,

    Three clustering algorithms can take as input distance or
    similarity matrices instead of the observations
    (AgglomerativeClustering
    
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering>,
    AffinityPropagation
    
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html#sklearn.cluster.AffinityPropagation>,
    and DBSCAN
    
<http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN>),
    but there are inconsistencies in their documentations.


    *DBSCAN :*
       The documentation explains clearly how to run DBSCAN with a
    precomputed distance matrix.
       Constructor:/
           metric: If metric is “precomputed”, X is assumed to be a
    distance matrix and must be square.
    /
       fit / fit_predict /:
           X: A feature array, or array of distances between samples
    if |metric='precomputed'|.


    /
    *AffinityPropagation :
    *
        Constructor:
            affinity: /Which affinity to use. At the moment
    |precomputed| and |euclidean| are supported. |euclidean| uses the
    negative squared euclidean distance between points.
    /
        fit : /
            X: //Data matrix or, if affinity is |precomputed|, matrix
    of similarities / affinities.
    /
        fit_predict :/
    /
    /        X: Input data. /
            X can also be a matrix of similarities ? fit and
    fit_predict should share the same documentation for the input X ?/


    /
    *AgglomerativeClustering :
    *    Constructor:
    /affinity: Metric used to compute the linkage. Can be “euclidean”,
    “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is
    “ward”, only “euclidean” is accepted/.
    The name of the parameter 'affinity' seems misleading, since it
    does not correspond to similarity functions, but to distance
    functions.
        fit : /
            X: //The samples a.k.a. observations./
        fit_predict :/
    //        X: //Input data.
    /The documentation of fit and fit_predict does not specify that X
    can also be a matrix of distances.

    The user may be confused whether he/she should provide a distance
    or a similarity matrix to AgglomerativeClustering.
    The documentation of fit and fit_predict can be easily updated. As
    for the name of the 'affinity' parameter, it is more difficult
    since it involves an API change.


    What do you think of these potential updates of the documentation ?

    Cheers,

    Anaël Beaugnon
    //

    _______________________________________________
    scikit-learn mailing list
    scikit-learn@python.org <mailto:scikit-learn@python.org>
    https://mail.python.org/mailman/listinfo/scikit-learn
    <https://mail.python.org/mailman/listinfo/scikit-learn>



_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Inconsistencies in clustering documentations

Reply via email to