Hi Lars. All I can say is that it worked for me by passing X directly: http://scikit-learn.org/dev/auto_examples/cluster/plot_cluster_comparison.html
I'm deadlining right now, hopefully I have time to work on Olivier's "quadratic_fit" (or whatever) proposal afterward. Cheers, Andy On 03/13/2012 10:05 PM, Lars Buitinck wrote: > Hi all, > > A colleague approached me today asking how the scikit-learn DBSCAN > algorithm should be applied and I must admit that the documentation > and example was confusing even to me. The fit docstring says > > X: array [n_samples, n_samples] or [n_samples, n_features] > Array of distances between samples, or a feature array. > The array is treated as a feature array unless the metric is given as > 'precomputed'. > > However, the online demo does the following: > > D = distance.squareform(distance.pdist(X)) > S = 1 - (D / np.max(D)) > > db = DBSCAN().fit(S, eps=0.95, min_samples=10) > > which uses a similarity matrix rather than a feature matrix as input > without passing metric="precomputed". Am I missing some interesting > clustering trick here, or is this a bug? I tried running the example > with the original feature matrix X (without tuning the parameters) and > it gave different output: all points were considered a single cluster > with no outliers. > > TIA, > Lars > > [1] http://scikit-learn.org/0.10/auto_examples/cluster/plot_dbscan.html > > ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
