Re: [Scikit-learn-general] DBSCAN demo's input

Robert Layton Tue, 13 Mar 2012 14:36:42 -0700

On 14 March 2012 08:05, Andreas <[email protected]> wrote:

> Hi Lars.
> All I can say is that it worked for me by passing X directly:
>
> http://scikit-learn.org/dev/auto_examples/cluster/plot_cluster_comparison.html
>
> I'm deadlining right now, hopefully I have time to work on Olivier's
> "quadratic_fit" (or whatever) proposal afterward.
>
> Cheers,
> Andy
>
>
> On 03/13/2012 10:05 PM, Lars Buitinck wrote:
> > Hi all,
> >
> > A colleague approached me today asking how the scikit-learn DBSCAN
> > algorithm should be applied and I must admit that the documentation
> > and example was confusing even to me. The fit docstring says
> >
> >      X: array [n_samples, n_samples] or [n_samples, n_features]
> >          Array of distances between samples, or a feature array.
> >          The array is treated as a feature array unless the metric is
> given as
> >          'precomputed'.
> >
> > However, the online demo does the following:
> >
> >      D = distance.squareform(distance.pdist(X))
> >      S = 1 - (D / np.max(D))
> >
> >      db = DBSCAN().fit(S, eps=0.95, min_samples=10)
> >
> > which uses a similarity matrix rather than a feature matrix as input
> > without passing metric="precomputed". Am I missing some interesting
> > clustering trick here, or is this a bug? I tried running the example
> > with the original feature matrix X (without tuning the parameters) and
> > it gave different output: all points were considered a single cluster
> > with no outliers.
> >
> > TIA,
> > Lars
> >
> > [1] http://scikit-learn.org/0.10/auto_examples/cluster/plot_dbscan.html
> >
> >
>
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>


Lars, you are right, it should have metric='precomputed' in it.
However by passing the distance matrix without a metric, the features
become "distance to point i", which act as sort of meta-features anyway,
allowing training to happen. This means that it works with a distance
matrix as input, even though that is probably not what you wanted. I
believe that some classifiers use this concept, although it has escaped me
which ones.

I thought about this problem when I created DBSCAN, i.e. putting a test for
a square matrix with a zero diagonal. However it is a bit of a specific
test and I wasn't sure. Thoughts?

I'll fix this example later this week to have metric="precomputed".

Thanks!

- Robert

-- 

Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] DBSCAN demo's input

Reply via email to