Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-11 Thread Andreas Mueller
Hi Mathieu. Sorry, busy busy. Thanks for the nudge. Putting the "about us" in the side bar is a good idea I think. Will try to find the time. Also creating a PR so we can discuss there is pretty much on top on my todo :) Cheers, Andy On 03/12/2013 03:06 AM, Mathieu Blondel wrote: > Hello Andy, >

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

2013-03-11 Thread Joel Nothman
To make it clearer, here's the current implementation's take on the example from the other day: >>> from sklearn import grid_search, linear_model, datasets >>> iris = datasets.load_iris() >>> clf = grid_search.GridSearchCV(linear_model.LogisticRegression(), {'C': [1, 10]}, scoring='prf1') >>> clf.

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

2013-03-11 Thread Joel Nothman
PS: notice how easy it would be to add results['elapsed'] = time.time() - start_time into fit_grid_point and voila! a 2d array of elapsed times... On Tue, Mar 12, 2013 at 5:05 PM, Joel Nothman wrote: > An implementation, without backwards compatibility or new tests (but with > tests.test_grid_sea

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

2013-03-11 Thread Joel Nothman
An implementation, without backwards compatibility or new tests (but with tests.test_grid_search modified to pass) at https://github.com/jnothman/scikit-learn/tree/cv-enhanced-results ( c9d45a3444). I hope you

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

2013-03-11 Thread Joel Nothman
On Mon, Mar 11, 2013 at 11:24 AM, Joel Nothman wrote: > On 03/10/2013 16:42:44 +0100, Andreas Mueller wrote: >> >> As an aside: if you had all fitted estimators, it would also be quite >> easy to compute the other scores, right? >> Would that be an acceptable solution for you? >> > > I guess so (

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-11 Thread Mathieu Blondel
Hello Andy, Thanks a lot for tackling this. I think the new homepage will be even nicer if we add the "about us" section to the left menu and add a code snippet as suggested by Olivier. Mathieu -- Symantec Endpoint Prote

Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

2013-03-11 Thread Olivier Grisel
2013/3/11 Lars Buitinck : > 2013/3/11 Olivier Grisel : >> 2013/3/11 Roman Sinayev : >>> I got CountVectorizer about 2x faster without multiprocessing so far, >>> however I have a couple of questions. > > I'm curious how you pulled that off. > >>> 1. Why do we not use max_df and min_df and max_featu

Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

2013-03-11 Thread Lars Buitinck
2013/3/11 Olivier Grisel : > 2013/3/11 Roman Sinayev : >> I got CountVectorizer about 2x faster without multiprocessing so far, >> however I have a couple of questions. I'm curious how you pulled that off. >> 1. Why do we not use max_df and min_df and max_features when custom >> vocabulary is pro

Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

2013-03-11 Thread Olivier Grisel
2013/3/11 Roman Sinayev : > I got CountVectorizer about 2x faster without multiprocessing so far, > however I have a couple of questions. > > 1. Why do we not use max_df and min_df and max_features when custom > vocabulary is provided? > Some people may provide a huge vocabulary, but they wouldn't

[Scikit-learn-general] Hands-on Webinar Series (no charge) The Evolution of Regression from Classical Linear Regression to Modern Ensembles

2013-03-11 Thread Lisa Solomon
Maybe you missed Part 1 of "The Evolution of Regression Modeling from Classical Linear Regression to Modern Ensembles " webinar series, but you can still join for Parts 2, 3, & 4 Register Now for Parts 2, 3, 4: https://www1.gotomeeting.com/register/500959705 Download (optional) a free evaluation

Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

2013-03-11 Thread Roman Sinayev
I got CountVectorizer about 2x faster without multiprocessing so far, however I have a couple of questions. 1. Why do we not use max_df and min_df and max_features when custom vocabulary is provided? Some people may provide a huge vocabulary, but they wouldn't be interested in some words if they'r

Re: [Scikit-learn-general] Composite scores in grid_search.BaseSearchCV

2013-03-11 Thread Andreas Mueller
On 03/11/2013 01:24 AM, Joel Nothman wrote: On 03/10/2013 16:42:44 +0100, Andreas Mueller wrote: If you have an elegant solution, I'm all ears, though ;) Here's a hacky solution for my particular case which requires git revert 2d9cb81b8 to work at HEAD. It works by returning a Score