Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Andreas Mueller
Sorry for not being able to help you with the actual problem, but another hint: I have a pull request for randomly sampling the parameter space, which should be much more efficient in a model with so many parameters. https://github.com/scikit-learn/scikit-learn/pull/1194

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Andreas Mueller
>> 2) how would I go about grid search over different vectorizers (e.g. >> CountVectorizer(analyzer="word"), CountVectorizer(analyzer="char_wb"), and a >> FeatureUnion of the two)? > You could always use a FeatureUnion and give it different TransformerLists via the GridSearchCV (at least I think t

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Mathieu Blondel
On Fri, Nov 16, 2012 at 3:28 PM, Gael Varoquaux < gael.varoqu...@normalesup.org> wrote: > On Thu, Nov 15, 2012 at 05:07:24PM -0800, Fred Mailhot wrote: > > 1) there are a few LinearSVC options (penalty/loss, penalty/dual) for > which > > certain values are incompatible, but which are not documente

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Gael Varoquaux
On Thu, Nov 15, 2012 at 05:07:24PM -0800, Fred Mailhot wrote: > 1) there are a few LinearSVC options (penalty/loss, penalty/dual) for which > certain values are incompatible, but which are not documented as such...this > makes grid search a bit of a pain. Indeed, they should be documented. Pull re

Re: [Scikit-learn-general] Data Set on Tutorial: Machine Learning for Astronomy with Scikit-learn

2012-11-15 Thread Leon Palafox
Hello Jake, The error is easy to reproduce, after downloading the data for the file sdss_photoz via the fetch_data script: data=np.load('./sklearn_tutorial/doc/data/sdss_photoz/sdss_photoz.npy') print data.dtype.names # count=0 N=len(data) X

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Fred Mailhot
I already know that things work with n_jobs=1. I just tried n_jobs=-1 with a few smaller datasets (100 & 1000 items) and things seem to have worked fine (without LinearSVC, see below). Possibly there's something wrong with the larger dataset...investigating now. A couple of points related to grid

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Andreas Mueller
Are you sure the error is related to n_jobs, not a specific classifier? Could you run with n_jobs=1 and a very small training set (like 100 examples or something) and see if it runs through? (Actually I'm totally clueless but that doesn't look like a multiprocessing error to me) On 11/15/201

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Fred Mailhot
Argh, copy-paste error: https://gist.github.com/e2ca1910450819a8a287 As for Accelerate, I'm not 100% how to check that (I cloned & ran "setup.py build" and "setup.py install" without making any changes, if memory serves), but this leads me to think "yes": $ otool -L /Users/aboutuser/Development/

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Gael Varoquaux
> I definitely would like to see the term "data mining" stay -- we want > to show up in results for "python data mining" in google. But I > wouldn't mind "applications like data mining", and saying that sklearn > is a "statistical package" or something similar. Maybe we want something like 'keywor

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Robert Layton
On 16 November 2012 00:36, Lars Buitinck wrote: > 2012/11/15 Jaques Grobler : > > @Lars you countered Olivier's paragraph with a quote from Oliver :D hehe > > Oops, I intended to reply to Nelle. Sorry Olivier! :) > > -- > Lars Buitinck > Scientific programmer, ILPS > University of Amsterdam > > >

Re: [Scikit-learn-general] GridSearch example

2012-11-15 Thread Andreas Mueller
Hi Fred. The link is dead for me. Do you link against Accelerate (not sure if this is relevant)? Cheers, Andy On 11/15/2012 08:45 PM, Fred Mailhot wrote: Dear list, I'm using GridSearchCV to do some simple model selection for a text classification task. I've got it working (see below for cave

[Scikit-learn-general] GridSearch example

2012-11-15 Thread Fred Mailhot
Dear list, I'm using GridSearchCV to do some simple model selection for a text classification task. I've got it working (see below for caveat), but I'm not convinced that I'm making the best use of this tool. If someone has the time/inclination, I'd love a set of eyes to check the following gist t

Re: [Scikit-learn-general] test-regress to run all the sklearn regressors with prints and save

2012-11-15 Thread denis
Olivier, actually, SGDRegressor is best on boston (of those that give coefs) so that would be my first choice, for problems big or small. Grid search ? who has the time ? OK ... in fact L1-regularization shrinks coefs and R2 towards 0 but av, max |residuals| get worse -- SGDRegressor boston pen

Re: [Scikit-learn-general] Data Set on Tutorial: Machine Learning for Astronomy with Scikit-learn

2012-11-15 Thread Jake Vanderplas
Hi Leon, I haven't run into any NaN issues, or heard of anyone else having that problem. Can you send the traceback for the specific error you're getting? Thanks Jake On 11/15/2012 04:14 AM, Jaques Grobler wrote: Hi Leon - I hadn't encountered this back when I looked at this. I think @J

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Lars Buitinck
2012/11/15 Jaques Grobler : > @Lars you countered Olivier's paragraph with a quote from Oliver :D hehe Oops, I intended to reply to Nelle. Sorry Olivier! :) -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam -

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Jaques Grobler
@Lars you countered Olivier's paragraph with a quote from Oliver :D hehe That's why I think we could, if we wanna keep the Data Mining solutions in there, just mention that sklearn can be applied to areas like data mining, etc. IMHO :) 2012/11/15 Lars Buitinck > 2012/11/15 Olivier Grisel : > >

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Lars Buitinck
2012/11/15 Olivier Grisel : > I think that using unsupervised model for clustering or using random > forest to rank feature by importance can be part of data mining tasks. > Even building predictive models with a supervised signal can > sometimes be considered data mining. Sure, but as Olivier sa

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Olivier Grisel
I think that using unsupervised model for clustering or using random forest to rank feature by importance can be part of data mining tasks. Even building predictive models with a supervised signal can sometimes be considered data mining. However scikit-learn is not a full fledged data mining soft

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Jaques Grobler
What if we just mention that it can be applied to fields like data-mining etc. Then it doesn't claim to be a data-mining package or library but mentions that it can be used/applied for that. Unless we drop 'Data mining' alltogether from there. 2012/11/15 Nelle Varoquaux > > > > On 15 November 2

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Nelle Varoquaux
On 15 November 2012 12:35, Mathieu Blondel wrote: > > > On Thu, Nov 15, 2012 at 8:21 PM, Lars Buitinck wrote: > >> 2012/11/15 Gael Varoquaux : >> > scikit-learn integrates machine learning algorithms in the tightly-knit >> > scientific Python world, building upon numpy, scipy, and matplotlib. It

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Gael Varoquaux
On Thu, Nov 15, 2012 at 08:35:36PM +0900, Mathieu Blondel wrote: > "well-known algorithms" would do the trick too. "reference algorithms"? G -- Monitor your physical, virtual and cloud infrastructure from a single web co

Re: [Scikit-learn-general] Data Set on Tutorial: Machine Learning for Astronomy with Scikit-learn

2012-11-15 Thread Jaques Grobler
Hi Leon - I hadn't encountered this back when I looked at this. I think @JacobVanderPlas would perhaps be best with this since he put that tutorial together. I'm sure he'll be able to help with this. ping @jakevp :) Regards, J 2012/11/15 Leon Palafox > > Hey Guys, > > I was running the dat

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Mathieu Blondel
On Thu, Nov 15, 2012 at 8:21 PM, Lars Buitinck wrote: > 2012/11/15 Gael Varoquaux : > > scikit-learn integrates machine learning algorithms in the tightly-knit > > scientific Python world, building upon numpy, scipy, and matplotlib. It > > provides simple, efficient and effective data mining solu

[Scikit-learn-general] Data Set on Tutorial: Machine Learning for Astronomy with Scikit-learn

2012-11-15 Thread Leon Palafox
Hey Guys, I was running the data set in the Tree Regression Example for the astroml ( http://astroml.github.com/sklearn_tutorial/regression.html#a-simple-method-decision-tree-regression ) And I bumped with some NaN that come from the dataset. Has anyone else encountered this issue, and if so, ho

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Lars Buitinck
2012/11/15 Gael Varoquaux : > scikit-learn integrates machine learning algorithms in the tightly-knit > scientific Python world, building upon numpy, scipy, and matplotlib. It > provides simple, efficient and effective data mining solutions, > accessible to everybody and reusable in various context

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Robert Layton
On 15 November 2012 20:55, Andreas Mueller wrote: > Am 15.11.2012 10:50, schrieb Olivier Grisel: > > Andy, please feel free to add a new page to the documentation named > > "Who uses scikit-learn?" and where we can collect a bunch of > > testimonies (it's interesting not only to collect names of

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Andreas Mueller
Am 15.11.2012 10:50, schrieb Olivier Grisel: > Andy, please feel free to add a new page to the documentation named > "Who uses scikit-learn?" and where we can collect a bunch of > testimonies (it's interesting not only to collect names of companies / > organizations but also what specific component

Re: [Scikit-learn-general] Release schedule for 0.13

2012-11-15 Thread Andreas Mueller
Am 15.11.2012 10:34, schrieb Mathieu Blondel: > Tackling this one would be nice: > https://github.com/scikit-learn/scikit-learn/issues/1327 > > Currently, PassiveAggressiveClassifier is quite slower than Perceptron. > There is a list of issues tagged with the 0.13 milestone: https://github.com/scik

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Olivier Grisel
Andy, please feel free to add a new page to the documentation named "Who uses scikit-learn?" and where we can collect a bunch of testimonies (it's interesting not only to collect names of companies / organizations but also what specific components they use for which kind of problems).

Re: [Scikit-learn-general] "Classic machine learning"

2012-11-15 Thread Jaques Grobler
I like the new version. If we wanted to keep the word `classic` in there, I'd go for something like 'scikit-learn integrates both classic and recent machine learning algorithms in the tightly-knit scientific Python world, building upon numpy, scipy, and matplotlib.` Beyond that I think it's just pe

Re: [Scikit-learn-general] Release schedule for 0.13

2012-11-15 Thread Mathieu Blondel
Tackling this one would be nice: https://github.com/scikit-learn/scikit-learn/issues/1327 Currently, PassiveAggressiveClassifier is quite slower than Perceptron. Mathieu -- Monitor your physical, virtual and cloud infrast