Re: [Scikit-learn-general] ValueError: The number of classes has to be greater than one.

2014-09-11 Thread Joel Nothman
Use StratifiedKFold On 12 September 2014 13:03, Pagliari, Roberto wrote: > When using SVM or linearSVC, is it possible to force > cross_validation.KFold to generate subsets with both classes (in the case > of a two-class problem)? > > > --

[Scikit-learn-general] ValueError: The number of classes has to be greater than one.

2014-09-11 Thread Pagliari, Roberto
When using SVM or linearSVC, is it possible to force cross_validation.KFold to generate subsets with both classes (in the case of a two-class problem)? -- Want excitement? Manually upgrade your production database. When yo

Re: [Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Joel Nothman
Good point. It should be straightforward in any case, something like: class Quantizer(sklearn.base.TransformerMixin): def __init__(self, thresholds): self.thresholds = thresholds # must be sorted def transform(X, y=None): return np.searchsorted(self.thresholds, X) On 12 S

Re: [Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Pagliari, Roberto
In my case I would like to do it right after scaling, while doing grid search. This would be different to quantize the entire training set at the beginning. Thank you, From: Joel Nothman [mailto:[email protected]] Sent: Thursday, September 11, 2014 9:00 PM To: scikit-learn-general Subject

Re: [Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Joel Nothman
If thresholds can be provided to the constructor then they are not estimated automatically from the training data. This is the sort of preprocessing you can and should do with pandas. On 12 September 2014 10:53, Pagliari, Roberto wrote: > I’m getting errors about get_params_ missing etc… > > >

Re: [Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Pagliari, Roberto
I’m getting errors about get_params_ missing etc… I guess I need to derive my own binarizer from some other classes. Is there a way to simplify the process? Essentially, what I need is the binarizer, with more levels (and thresholds provided to the constructors). Thank you From: Joel Nothman

Re: [Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Joel Nothman
For quantizing or binning? Not currently. On 12 September 2014 06:31, Pagliari, Roberto wrote: > Is there something like the binarizer with more levels (thresholds > provided with input) > > > > > > Thanks > > > > > --

Re: [Scikit-learn-general] full pegasos learning rate support

2014-09-11 Thread F. William High
That appears to work (with a small modification): param_grid = [{'eta0':[1/alpha_this_step], 'alpha':[alpha_this_step]} for alpha_this_step in 10.0**(np.arange(11)-5)] Neat trick. Thanks. Unfortunately I don't have time to do the experiment, but eta = 1/t and eta = 1/(alpha*t) seem to be pretty

[Scikit-learn-general] binarizer with more levels

2014-09-11 Thread Pagliari, Roberto
Is there something like the binarizer with more levels (thresholds provided with input) Thanks -- Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce Perforce version c

Re: [Scikit-learn-general] full pegasos learning rate support

2014-09-11 Thread Andy
You can actually do that using the current grid-search. Specify the "grid" as a list of single grid-points. That should do. param_grid = [{'eta0':1/alpha_this_step, 'alpha':alpha_this_step} for alpha_this_step in my_alphas] That should do it, right? I think for the "optimum" the same guarantee

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
My mistake. I was using those names in the wrong place. Thank you, -Original Message- From: Andy [mailto:[email protected]] Sent: Thursday, September 11, 2014 4:24 PM To: [email protected] Subject: Re: [Scikit-learn-general] modify gridsearch to scale cross-valid

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Andy
On 09/11/2014 07:58 PM, Pagliari, Roberto wrote: > I'm getting errors when using these parameters > > linear_svm__penalty > linear_svm__loss > linear_svm__dual > > don't they have the same names? They do. Can you provide a snipplet and a traceback? -

Re: [Scikit-learn-general] full pegasos learning rate support

2014-09-11 Thread F. William High
Just noticed I got my Greeks wrong. By nu I meant eta everywhere. On Thu, Sep 11, 2014 at 2:24 PM, F. William High wrote: > The Shalev-Schwartz et al. Pegasos update rule on the learning rate > parameter is > > nu_i = 1 / (lambda * i) > > where lambda multiplies the regularization term. If thi

Re: [Scikit-learn-general] gridsearch with sparsematrices

2014-09-11 Thread Manoj Kumar
This is simply due to the fact that centering a sparse matrix (that is subtracting from its mean), would make it dense, and there would be no point of it being sparse in the first place. On Thu, Sep 11, 2014 at 9:09 PM, Pagliari, Roberto wrote: > This is what I’m getting when using sparse matric

[Scikit-learn-general] gridsearch with sparsematrices

2014-09-11 Thread Pagliari, Roberto
This is what I'm getting when using sparse matrices with grid search: Cannot center sparse matrices: pass `with_mean=False` instead. See docstring for motivation and alternatives. If I set with_mean=False, does that mean the data will not be scaled with respect to the mean? If so, why would one

[Scikit-learn-general] full pegasos learning rate support

2014-09-11 Thread F. William High
The Shalev-Schwartz et al. Pegasos update rule on the learning rate parameter is nu_i = 1 / (lambda * i) where lambda multiplies the regularization term. If this rule is used, they show you can converge to an error of epsilon in O(1/(lambda*epsilon)) iterations at high probability. This differs

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Josh Vredevoogd
You'll want a predict() or a transform() as well, depending on whether it's acting as an estimator or transformer. On Thu, Sep 11, 2014 at 9:57 AM, Pagliari, Roberto wrote: > Thank you, > > I’m going to try shortly. > > > > And in general, if I wanted to put my own function in the pipeline, the

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
I'm getting errors when using these parameters linear_svm__penalty linear_svm__loss linear_svm__dual don't they have the same names? I tried linear_svc but it doesn't work either Thank you -Original Message- From: Laurent Direr [mailto:[email protected]] Sent: Thursday,

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
Thank you, I’m going to try shortly. And in general, if I wanted to put my own function in the pipeline, the only requirement is that the class must have the “fit” method? Thank you again, From: Josh Vredevoogd [mailto:[email protected]] Sent: Thursday, September 11, 2014 12:52 PM To: sciki

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Laurent Direr
Hi, If you test this code you will see it raises an error ;). The naming of the parameters in the param_grid should be consistent with the names in the Pipeline object. GridSearchCV performs grid search on the Pipeline object so it cannot understand what the 'LinearSVC__C' parameter means. If

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Josh Vredevoogd
You're missing estimators = in the first line, I guess. params should be: params = dict(linear_svm__C=[0.1, 10, 100]) On Thu, Sep 11, 2014 at 9:44 AM, Pagliari, Roberto wrote: > Hi, > Yes, I think you are right. > > Is the code below how it should be done (scaling+linearsvc)? > > [('scaler', Sca

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
Hi, Yes, I think you are right. Is the code below how it should be done (scaling+linearsvc)? [('scaler', Scaler()), ('linear_svm', LinearSVC())] clf = Pipeline(estimators) params = dict(LinearSVC__C=[0.1, 10, 100]) gs = GridSearchCV(clf, param_grid=params) Thank you, -Original Message---

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Laurent Direr
Hello, I think a pipeline does precisely what you are asking for: http://scikit-learn.org/stable/modules/pipeline.html If you include the scaler as a step in the pipeline it should behave the way you described in your first email. Laurent On 09/11/2014 04:59 PM, Pagliari, Roberto wrote: > I'm

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
I'm not trying to scale the dataset at the very beginning. I would like to scale while doing gridsearchCV. Thanks, -Original Message- From: Pagliari, Roberto [mailto:[email protected]] Sent: Thursday, September 11, 2014 10:52 AM To: [email protected] Subj

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
I'm not sure how to do it when using gridsearch. Can you provide an example? Thank you, -Original Message- From: Gael Varoquaux [mailto:[email protected]] Sent: Thursday, September 11, 2014 10:50 AM To: [email protected] Subject: Re: [Scikit-learn-ge

Re: [Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Gael Varoquaux
Use a pipeline. G On Thu, Sep 11, 2014 at 02:47:48PM +, Pagliari, Roberto wrote: > Hello, > Gridsearch with CV is something like this at a high level: > for every combination of parameters: >for every partition of training data > split training into train_cv and test_cv >

[Scikit-learn-general] modify gridsearch to scale cross-validation training/test dataset

2014-09-11 Thread Pagliari, Roberto
Hello, Gridsearch with CV is something like this at a high level: for every combination of parameters: for every partition of training data split training into train_cv and test_cv train_classifier(train_cv).predict(test_cv) compute score average score if max so far, then u