Re: [Scikit-learn-general] k-nearest neighbors for complex data

2014-09-17 Thread Mohamed-Rafik Bouguelia
Hi, You cannot use complex numbers, they should be real numbers. Each data point should be in |R^d (where d is the dimensionality). 2014-09-17 19:45 GMT+02:00 Neal Becker : > I just tried k-nearest neighbors where the data are complex. It doesn't > seem to > work correctly. > > I tried > > impo

[Scikit-learn-general] k-nearest neighbors for complex data

2014-09-17 Thread Neal Becker
I just tried k-nearest neighbors where the data are complex. It doesn't seem to work correctly. I tried import numpy as np from const64apsk import gen_constellation_64apsk const = gen_constellation_64apsk ('3/4') X = [[e] for e in const] y = np.arange(64) from sklearn.neighbors import KNeigh

Re: [Scikit-learn-general] oob_score_ for random forests for regression

2014-09-17 Thread Josh Wasserstein
Thanks Arnaoud. Josh On Fri, Sep 12, 2014 at 2:03 PM, Arnaud Joly wrote: > Here the link to the issue > https://github.com/scikit-learn/scikit-learn/issues/3455 > > Arnaud > > On 12 Sep 2014, at 20:01, Arnaud Joly wrote: > > If you want to work on custom oob scoring, there is an issue opened >

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-17 Thread Mathieu Blondel
Andy, Indeed, this will mostly depend on the number of public utils we have. However, using submodules can help structure our public utils. M. On Wed, Sep 17, 2014 at 6:32 PM, Andy wrote: > On 09/15/2014 03:40 PM, Mathieu Blondel wrote: > >> lightning is using the following utils: >> >> - chec

Re: [Scikit-learn-general] random forest different misclassification cost

2014-09-17 Thread Andy
Hi Maksym. If you only want the loss to be reweighted according to class, you can simply use sample_weights to give more emphasis to the samples of this class. If you want some other loss function, you might need to specify your own splitting criterion. Cheers, Andy On 09/16/2014 08:39 AM,

Re: [Scikit-learn-general] Backward compat policy in utils

2014-09-17 Thread Andy
On 09/15/2014 03:40 PM, Mathieu Blondel wrote: > lightning is using the following utils: > > - check_random_state > - safe_sparse_dot > - shuffle > - safe_mask > - sklearn.utils.testing.* > > The latter is not big deal but I like importing assertions from the > same place. > > On a second thought,

[Scikit-learn-general] CountVectorizer token pattern

2014-09-17 Thread Nathan Breit
I was wondering what the rationale is for making the default token pattern for the CountVectorizer require *2* or more alphanumeric characters to form a token. This was not intuitive default behavior for me, so I ended up with a bug where some strings in my vocabulary like "Hepatitis A" were not co