Re: [Scikit-learn-general] Implementing n_jobsfor OneVsRestClassifier

2012-12-04 Thread Andreas Mueller
On 12/04/2012 02:05 AM, Afik Cohen wrote: > Andreas Mueller writes: > >> On 12/03/2012 09:39 PM, Afik Cohen wrote: >>> No, we aren't doing multi-label classification, just multiclass. He was > saying >>> we could just use SGDClassifier directly, which is true, but AFAIK there is > no >>> way to ge

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Philipp Singer
> It's probably better to train a linear classifier on the text features > alone and a second (potentially non linear classifier such as GBRT or > ExtraTrees) on the predict_proba outcome of the text classifier + your > additional low dim features. > > This is some kind of stacking method (a sort

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Andreas Mueller
Am 04.12.2012 11:45, schrieb Philipp Singer: >> It's probably better to train a linear classifier on the text features >> alone and a second (potentially non linear classifier such as GBRT or >> ExtraTrees) on the predict_proba outcome of the text classifier + your >> additional low dim features. >

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Olivier Grisel
2012/12/4 Philipp Singer : > >> It's probably better to train a linear classifier on the text features >> alone and a second (potentially non linear classifier such as GBRT or >> ExtraTrees) on the predict_proba outcome of the text classifier + your >> additional low dim features. >> >> This is som

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Andreas Mueller
Am 04.12.2012 12:20, schrieb Olivier Grisel: > 2012/12/4 Philipp Singer : >>> It's probably better to train a linear classifier on the text features >>> alone and a second (potentially non linear classifier such as GBRT or >>> ExtraTrees) on the predict_proba outcome of the text classifier + your >

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Philipp Singer
Am 04.12.2012 12:26, schrieb Andreas Mueller: > Am 04.12.2012 12:20, schrieb Olivier Grisel: >> 2012/12/4 Philipp Singer : It's probably better to train a linear classifier on the text features alone and a second (potentially non linear classifier such as GBRT or ExtraTrees) on the p

[Scikit-learn-general] Upgraded jenkins environment for matplotlib testing

2012-12-04 Thread Olivier Grisel
I have updated the virtualenvs of the jenkins vm to use: - ubuntu LTS matplotlib 0.99.1 on python 2.6 - latest stable matplotlib 1.2.0 on python 2.7 -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel -- Lo

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Philipp Singer
> Have you scaled your additional features to the [0-1] range as the > probability features from the text classifier? > Until now I performed Scaler() (im on 0.12 atm) on the new feature space. Should I do this on my appended features only? But well, they are not exactly between 0 or 1 then. I

Re: [Scikit-learn-general] Upgraded jenkins environment for matplotlib testing

2012-12-04 Thread Andreas Mueller
Am 04.12.2012 12:35, schrieb Olivier Grisel: > I have updated the virtualenvs of the jenkins vm to use: > > - ubuntu LTS matplotlib 0.99.1 on python 2.6 > - latest stable matplotlib 1.2.0 on python 2.7 > Merci beaucoup :)

Re: [Scikit-learn-general] Upgraded jenkins environment for matplotlib testing

2012-12-04 Thread Peter Prettenhofer
thanks! 2012/12/4 Andreas Mueller : > Am 04.12.2012 12:35, schrieb Olivier Grisel: >> I have updated the virtualenvs of the jenkins vm to use: >> >> - ubuntu LTS matplotlib 0.99.1 on python 2.6 >> - latest stable matplotlib 1.2.0 on python 2.7 >> > Merci beaucoup :) > > ---

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Olivier Grisel
2012/12/4 Philipp Singer : > >> Have you scaled your additional features to the [0-1] range as the >> probability features from the text classifier? >> > > Until now I performed Scaler() (im on 0.12 atm) on the new feature > space. Should I do this on my appended features only? But well, they are >

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Olivier Grisel
2012/12/4 Philipp Singer : > > I use a linear SVM for learning my probabilities for the samples (I have > used grid search for determining the optimal paramters). Then I append > the additional features and do as suggested gradient boosting or extra > tree classifier. What do you mean by testing ju

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Philipp Singer
Am 04.12.2012 15:15, schrieb Olivier Grisel: > 2012/12/4 Philipp Singer : >> >>> Have you scaled your additional features to the [0-1] range as the >>> probability features from the text classifier? >>> >> >> Until now I performed Scaler() (im on 0.12 atm) on the new feature >> space. Should I do t

Re: [Scikit-learn-general] Append additional data in pipeline

2012-12-04 Thread Olivier Grisel
Tree based models such as ExtraTrees do not need scaling at all. So the difference you see is probably just cross validation (especially with such a small number of samples). Scaling is only useful for models that makes a prior assumption on the feature distributions such as a the L2 regularizer (

Re: [Scikit-learn-general] Implementing n_jobsfor OneVsRestClassifier

2012-12-04 Thread Afik Cohen
> > Will this let us run SGDClassifier and show us per-class probability outputs? > > Again, that's the only reason we've been using OneVsRestClassifier. Let me > > explain what I mean by per-class probability, just in case it isn't clear: > > > > SGDClassifier's predict_proba() returns probabilit

Re: [Scikit-learn-general] Implementing n_jobsfor OneVsRestClassifier

2012-12-04 Thread Andreas Mueller
>>> [(0.4, 0.5), (0.7, 0.3), (0.8, 0.2), (0.9, 0.1), (0.6, 0.4)] for five >>> classes >>> showing the probability that the input does not belong/does belong to that >>> class, respectively. >>> >> Yes, if you don't normalize. >> You are aware that this is inconsistent when you are doing multi-cla

Re: [Scikit-learn-general] Shape of classes_ varies?

2012-12-04 Thread Gilles Louppe
Doug, You will be happy to hear that this is now freshly fixed in master. Attributes are now flat in case of single output problems (as you expected) and nested for multi-output problems (as before). Best, Gilles On 30 November 2012 17:31, Gael Varoquaux wrote: >> I guess transforming it would