Re: [Scikit-learn-general] DBSCAN

2014-07-17 Thread Robert Layton
Hi Roberto, >From the docs: X: array [n_samples, n_samples] or [n_samples, n_features] Array of distances between samples, or a feature array. The array is treated as a feature array unless the metric is given as 'precomputed'. In most cases, X is the

[Scikit-learn-general] DBSCAN

2014-07-17 Thread Pagliari, Roberto
When using DBSCAN as in the examples: db = DBSCAN(eps=0.3, min_samples=10).fit(X) I'm not sure if I understand what X is. Is X[i][j] supposed to be some sort of measure from node i to node j? If so, does X need to be normalized, or will DBSCAN scale the values accordingly? Thank you, -

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-17 Thread Christian Jauvin
> I think the encoders should all be able to deal with unknown labels. > The thing about the extra single value is that you don't have a column > to map it to. > How would you use the extra value in LabelBinarizer or OneHotEncoder? You're right, and this points to a difference between what PR #324

Re: [Scikit-learn-general] LabelBinarizer change between 0.14 and 0.15

2014-07-17 Thread Andy
I think the encoders should all be able to deal with unknown labels. The thing about the extra single value is that you don't have a column to map it to. How would you use the extra value in LabelBinarizer or OneHotEncoder? For LabelEncoder I think it would make sense. On 07/17/2014 12:59 AM, Ch