Re: [Scikit-learn-general] DBSCAN

2014-07-17 Thread Robert Layton
Hi Roberto, >From the docs: X: array [n_samples, n_samples] or [n_samples, n_features] Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as 'precomputed'. In most cases, X is the

[Scikit-learn-general] DBSCAN

2014-07-17 Thread Pagliari, Roberto
When using DBSCAN as in the examples: db = DBSCAN(eps=0.3, min_samples=10).fit(X) I'm not sure if I understand what X is. Is X[i][j] supposed to be some sort of measure from node i to node j? If so, does X need to be normalized, or will DBSCAN scale the values accordingly? Thank you, -

Re: [Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Andreas Mueller
On 05/08/2012 10:41 AM, Gael Varoquaux wrote: > On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote: >> I presumed there are valid reasons for using a numpy array, and represent >> error points as -1. > Reasons for using a numpy array are that it is faster and more memory > effici

Re: [Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Gael Varoquaux
On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote: > I presumed there are valid reasons for using a numpy array, and represent > error points as -1. Reasons for using a numpy array are that it is faster and more memory efficient than a list. Gael -

[Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Félix-Antoine Fortin
Hi, I have recently used the DBSCAN implementation of scikit-learn, and I have a "quick" question. Currently, noise points are labelled as -1 in a numpy array. >From my point of view, clustering labels can be used for example as index of a >sequence. However, in Python -1 is still a valid ind

Re: [Scikit-learn-general] DBSCAN demo's input

2012-03-13 Thread Robert Layton
On 14 March 2012 08:54, Lars Buitinck wrote: > 2012/3/13 Robert Layton : > > Lars, you are right, it should have metric='precomputed' in it. > > However by passing the distance matrix without a metric, the features > become > > "distance to point i", which act as sort of meta-features anyway, > a

Re: [Scikit-learn-general] DBSCAN demo's input

2012-03-13 Thread Lars Buitinck
2012/3/13 Robert Layton : > Lars, you are right, it should have metric='precomputed' in it. > However by passing the distance matrix without a metric, the features become > "distance to point i", which act as sort of meta-features anyway, allowing > training to happen. This means that it works with

Re: [Scikit-learn-general] DBSCAN demo's input

2012-03-13 Thread Robert Layton
On 14 March 2012 08:05, Andreas wrote: > Hi Lars. > All I can say is that it worked for me by passing X directly: > > http://scikit-learn.org/dev/auto_examples/cluster/plot_cluster_comparison.html > > I'm deadlining right now, hopefully I have time to work on Olivier's > "quadratic_fit" (or whate

Re: [Scikit-learn-general] DBSCAN demo's input

2012-03-13 Thread Andreas
Hi Lars. All I can say is that it worked for me by passing X directly: http://scikit-learn.org/dev/auto_examples/cluster/plot_cluster_comparison.html I'm deadlining right now, hopefully I have time to work on Olivier's "quadratic_fit" (or whatever) proposal afterward. Cheers, Andy On 03/13/2012

[Scikit-learn-general] DBSCAN demo's input

2012-03-13 Thread Lars Buitinck
Hi all, A colleague approached me today asking how the scikit-learn DBSCAN algorithm should be applied and I must admit that the documentation and example was confusing even to me. The fit docstring says X: array [n_samples, n_samples] or [n_samples, n_features] Array of distances bet