Re: [Scikit-learn-general] problem with parallel computing on windows xp

2013-07-31 Thread Gael Varoquaux
On Thu, Aug 01, 2013 at 11:46:09AM +0800, Shuo Wang wrote: > ImportError: [joblib] Attempting to do parallel computingwithout protecting > your import on a system that does not support forking. To use > parallel-computing in a script, you must protect you main loop using "if > __name__ == '__main__

Re: [Scikit-learn-general] Scikit-learn alpha release

2013-07-31 Thread Gael Varoquaux
Hey Chris, This is good news. The problems are fairly minor. Don't worry about the issue. These tests failing are numerically unstable ones. We'll see what we can do about them, but they are not release blockers. The good news is that we don't have major building or linking problem. Thanks a lot!

Re: [Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

2013-07-31 Thread Joel Nothman
I think all those results correspond to the RBF kernel. You have far too few samples to learn an RBF model, so it's stored trivial coefficients independent of C and gamma. On Thu, Aug 1, 2013 at 1:56 PM, Josh Wasserstein wrote: > Hi, > > I am noticing that for some models in my grid search I get

[Scikit-learn-general] Identical scores across repetitions of repeated CV ?? (figure included)

2013-07-31 Thread Josh Wasserstein
Hi, I am noticing that for some models in my grid search I get virtually the same exact results across 100 repetitions of CV. Is this normal? In case it matters, I am working with ~30 data points (I know, it's a small dataset) with ~5 dimensions. Below are the details of the configuration that I

[Scikit-learn-general] problem with parallel computing on windows xp

2013-07-31 Thread Shuo Wang
Hi, I am trying to run 4 jobs on windows xp with sklearn 0.13.1 model = RandomForestRegressor(n_estimators=500, compute_importances =True, n_jobs =4) I am receiving the following error Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\multiprocessing\forking.py",

[Scikit-learn-general] Scikit-learn alpha release

2013-07-31 Thread Gael Varoquaux
As Christoph, I am contacting you because you are the guy that rocks and provides fantastically useful binaries of many scientific-computing packages under Windows. We (the scikit-learn team) are going to release a new version of scikit-learn. I have tagged the alpha release and uploaded the sourc

Re: [Scikit-learn-general] GridSearchCV with multi-label: ROC-AUC-equivalent metrics

2013-07-31 Thread Arnaud Joly
It's what they have done in the mulan library. Arnaud On 19 Jul 2013, at 13:24, Olivier Grisel wrote: > 2013/7/19 Arnaud Joly : >> You can probably average the precision recall curve >> or use some ranking metrics [1]. >> >> Arnaud >> >> [1] Mining Multi-label Data >> http://lkm.fri.uni-lj.s

Re: [Scikit-learn-general] random forest string data

2013-07-31 Thread Oğuz Yarımtepe
{"word": vocabulary[word], ...} the trained data is lie [[0.0, 1.0, 'xxx', 'yyy', '13.0', ...], ] so when i use DictVectorizer it will create an array when i run fit_transform somethign like array([[ 1., 0.], [ 0., 1.]]) with different shape and data. I am not sure how i will repla

Re: [Scikit-learn-general] random forest string data

2013-07-31 Thread Lars Buitinck
2013/7/31 Oğuz Yarımtepe : > How will i use DictVectorizer for string values above? It won't do categorical integer coding directly. You can keep a separate dict of the string values, say vocabulary, then feed DictVectorizer dicts of the form {"word": vocabulary[word], ...} -- Lars Buitinck

Re: [Scikit-learn-general] random forest string data

2013-07-31 Thread Oğuz Yarımtepe
On Mon, Jul 29, 2013 at 12:19 AM, Ross Boucher wrote: > Interesting, I've been using DictVectorizer (and one hot coded categorical > data) with Random Forests and getting decent results. Is this just > coincidental, and will I see better results if I combine the categorical > data into a single c

Re: [Scikit-learn-general] random forest string data

2013-07-31 Thread Oğuz Yarımtepe
Hi, > What you get from DictVectorizer is a sparse matrix containing one-hot > coded categorical values (booleans). Random forests don't support > those, but fortunately they (should) handle categorical values without > one-hot coding, so you do something like > > I tried with string values and

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Jaques Grobler
Makes sense to me to deprecate here +1 2013/7/31 Olivier Grisel > +1 for deprecating boolean mask for CV as well. > > > -- > Get your SQL database under version control now! > Version control is standard for application

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Olivier Grisel
+1 for deprecating boolean mask for CV as well. -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Gael Varoquaux
On Wed, Jul 31, 2013 at 09:14:15AM +1000, Joel Nothman wrote: > What is the intention behind indices=False; Old design oversight (aka historical reasons). > why not deprecate it and simplify the API and code? (And speed up > indexing by using np.take.) +1! Making things simpler is always better.

Re: [Scikit-learn-general] cross-validation and indices=False

2013-07-31 Thread Alexandre Gramfort
hi, indeed we could stick to indices and use np.take whenever possible. In [33]: A = np.random.randn(500, 500) In [34]: idx = np.unique(np.random.randint(0, 499, 400)) In [35]: mask = np.zeros(500, dtype=np.bool) In [36]: mask[idx] = True In [37]: %timeit A[idx] 1000 loops, best of 3: 1.79 ms per