[Scikit-learn-general] DBSCAN demo's input

Lars Buitinck Tue, 13 Mar 2012 14:05:59 -0700

Hi all,

A colleague approached me today asking how the scikit-learn DBSCAN
algorithm should be applied and I must admit that the documentation
and example was confusing even to me. The fit docstring says


    X: array [n_samples, n_samples] or [n_samples, n_features]
        Array of distances between samples, or a feature array.
        The array is treated as a feature array unless the metric is given as
        'precomputed'.

However, the online demo does the following:

    D = distance.squareform(distance.pdist(X))
    S = 1 - (D / np.max(D))

    db = DBSCAN().fit(S, eps=0.95, min_samples=10)

which uses a similarity matrix rather than a feature matrix as input
without passing metric="precomputed". Am I missing some interesting
clustering trick here, or is this a bug? I tried running the example
with the original feature matrix X (without tuning the parameters) and
it gave different output: all points were considered a single cluster
with no outliers.

TIA,
Lars

[1] http://scikit-learn.org/0.10/auto_examples/cluster/plot_dbscan.html

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] DBSCAN demo's input

Reply via email to