[Scikit-learn-general] Added the ipython/sklearn sprint to the official pycon list, please add your name if you are coming!

2012-01-24 Thread Fernando Perez
Hi folks, I just added our planned common sprint to the pycon sprint page: https://us.pycon.org/2012/community/sprints/projects/ I listed myself and Olivier as 'leaders' just so the organizers have someone to contact. Please add your name to that list if you plan on participating, as I imagine

Re: [Scikit-learn-general] CoefSelectTransformerMixin

2012-01-24 Thread Mathieu Blondel
CoefSelectTransformerMixin must be deprecated or deleted (I would favor the latter as I guess nobody uses it in user-land code) and replaced in favor of SelectorMixin. I will do it later today. https://github.com/scikit-learn/scikit-learn/issues/518 Mathieu --

Re: [Scikit-learn-general] CoefSelectTransformerMixin

2012-01-24 Thread Gael Varoquaux
On Tue, Jan 24, 2012 at 11:07:20PM +0100, Andreas wrote: > Taking the mean means that if a feature has a strong positive weight for one > class and a strong negative weight for another class, they might cancel, > leading to the feature being not present in the solution. > Why does that make sense?

[Scikit-learn-general] CoefSelectTransformerMixin

2012-01-24 Thread Andreas
Hi everybody. At the moment I'm trying to understand feature selection. I was looking at the "L1 based feature selection" that is described in the docs. I was trying to use that with LinearSVC but I don't really understand what is going on. Maybe someone can explain. I am in the mult-class setu

Re: [Scikit-learn-general] Add to docs [was Re: Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Gael Varoquaux
On Tue, Jan 24, 2012 at 08:02:34AM -0500, Satrajit Ghosh wrote: >this list generates a lot of practical useful information such as your >response below that gets "lost" (i.e. difficult to search if you don't >have the right terms) in the mailing list archives. could we think about >

Re: [Scikit-learn-general] Add to docs [was Re: Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Olivier Grisel
2012/1/24 Satrajit Ghosh : > hi olivier and others, > > this list generates a lot of practical useful information such as your > response below that gets "lost" (i.e. difficult to search if you don't have > the right terms) in the mailing list archives. could we think about how to > capture such in

[Scikit-learn-general] Add to docs [was Re: Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Satrajit Ghosh
hi olivier and others, this list generates a lot of practical useful information such as your response below that gets "lost" (i.e. difficult to search if you don't have the right terms) in the mailing list archives. could we think about how to capture such information in the docs/wiki? cheers,

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Olivier Grisel
Which classifier have you tried? Are you sure you selected the best hyper-parameters with GridSearchCV? Have your tried to normalize the dataset? For instance have a look at: http://scikit-learn.org/dev/modules/preprocessing.html For very sparse data with large variance in the feature, you shou

Re: [Scikit-learn-general] Best classification for very sparse and skewed feature matrix

2012-01-24 Thread Philipp Singer
Am 15.01.2012 19:45, schrieb Gael Varoquaux: > On Sun, Jan 15, 2012 at 07:39:00PM +0100, Philipp Singer wrote: >> The problem is that my representation is very sparse so I have a huge >> amount of zeros. > That's actually good: some of our estimators are able to use a sparse > representation to spe

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-24 Thread Dimitrios Pritsos
On 01/24/2012 10:49 AM, Olivier Grisel wrote: > 2012/1/24 Dimitrios Pritsos: >> Thank you very much for the advice. I will try this too(today!). >> however, it seems that I might need to use the partial_fit() in the near >> feature after I will collect/crawl a new corpus. >> So a question is, my re

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-24 Thread Olivier Grisel
2012/1/24 Dimitrios Pritsos : > > Thank you very much for the advice. I will try this too(today!). > however, it seems that I might need to use the partial_fit() in the near > feature after I will collect/crawl a new corpus. > So a question is, my result (20%) was due to some short of bug in > part

Re: [Scikit-learn-general] : FIT() using PyTables with very hight scalable data

2012-01-24 Thread Dimitrios Pritsos
On 01/23/2012 09:11 PM, Olivier Grisel wrote: > 2012/1/23 Dimitrios Pritsos: >> However, when I do the same test using partial_fit() for the same >> sub-set of my Data Set (see above) I am getting ~20%. >> >> Any Suggestions? > Do a grid search to find the best alpha on SGDClassifier (and on C for