Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-07 Thread Fabian Pedregosa
Should be fixed in current master: https://github.com/scikit-learn/scikit-learn/commit/09677e0193a9dc693f85684bb1c2da56fe6804a2 Although I haven't attempted to reproduce the issue, so please report back if it still not working. Best, Fabian. On Sun, Nov 6, 2011 at 5:56 PM, Olivier Grisel w

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Olivier Grisel
2011/11/6 Lars Buitinck : > > Shall we apply this workaround in the master as well? That way, grid > search will also work with user-defined estimators that don't uphold > the contract; better safe than sorry. I've already refactored Sami's > script into a testcase with a custom BrokenEstimator. A

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Lars Buitinck
2011/11/6 Olivier Grisel : > 2011/11/6 Lars Buitinck : >> 2011/11/6 Vlad Niculae : >>> This is exactly what I would expect as well. I think this is the >>> biggest gain of having objects instead of functions: being able to >>> store init parameters in an object that you will then fit and evaluate >

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Mathieu Blondel
On Sun, Nov 6, 2011 at 11:59 PM, Olivier Grisel wrote: > The fit contract is that it can be called several times and that the I added this "fit contract" to issue #406 https://github.com/scikit-learn/scikit-learn/issues/406 Mathieu -

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Vlad Niculae
Yes, I was thinking of a sequencial, exploratory IPython-style thing where you change something in your X and re-fit, when you don't want to clone and delete the old estimator. Hope this makes sense. Vlad > 2011/11/6 Lars Buitinck : >> 2011/11/6 Vlad Niculae : >>> This is exactly what I would exp

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Olivier Grisel
2011/11/6 Lars Buitinck : > 2011/11/6 Vlad Niculae : >> This is exactly what I would expect as well. I think this is the >> biggest gain of having objects instead of functions: being able to >> store init parameters in an object that you will then fit and evaluate >> on different sets. > > Isn't th

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Lars Buitinck
2011/11/6 Vlad Niculae : > This is exactly what I would expect as well. I think this is the > biggest gain of having objects instead of functions: being able to > store init parameters in an object that you will then fit and evaluate > on different sets. Isn't that what base.clone is for? -- Lar

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Vlad Niculae
> I would personally prefer if all estimators in scikit-learn could be > refitted. This is what I would expect from a batch fit method. > Mathieu > The fit contract is that it can be called several times and that the > results of previous calls are forgotten. Warm restart should be > handled expli

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Olivier Grisel
2011/11/6 Lars Buitinck : > 2011/11/6 Sami Liedes : >> clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS, n_jobs=10) >> clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True)) > > Reproduced with a much smaller set of TUNED_PARAMS. With n_jobs=1, the > problem does not occur. > > One solution i

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Mathieu Blondel
On Sun, Nov 6, 2011 at 11:44 PM, Lars Buitinck wrote: > One solution is to clone(best_estimator) at grid_search.py, before > line 341 (in the if self.refit block). Currently, the code assumes > that a fit estimator can simply be re-fit on a new dataset, which is > not true of sparse SVMs due to t

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Lars Buitinck
2011/11/6 Sami Liedes : > clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS, n_jobs=10) > clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True)) Reproduced with a much smaller set of TUNED_PARAMS. With n_jobs=1, the problem does not occur. One solution is to clone(best_estimator) at grid_sea

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Sami Liedes
On Sun, Nov 06, 2011 at 09:41:31AM +0100, Gael Varoquaux wrote: > Can you send us a small self-contained script that reproduces your > problem, and we'll try to fix it ASAP. Sure. This script reproduces the problem for me: from sklearn

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Olivier Grisel
2011/11/6 Gael Varoquaux : > On Sun, Nov 06, 2011 at 09:23:14AM +0100, Olivier Grisel wrote: >> > 1) make indices=True the default in cv objects > >> > 2) in check_cv, raise an exception if cv.indices=False and hasattr(X, >> > "tocsr") > >> I think we already for 1) in the past. Let me implement b

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Gael Varoquaux
On Sun, Nov 06, 2011 at 12:54:36AM +0200, Sami Liedes wrote: > It seems that sparse.SVC and GridSearchCV don't play along nicely if I > pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get > ValueError("cannot resize this array: it does not own its data") from > inside libsvm.pyx: Hi

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Gael Varoquaux
On Sun, Nov 06, 2011 at 09:23:14AM +0100, Olivier Grisel wrote: > > 1) make indices=True the default in cv objects > > 2) in check_cv, raise an exception if cv.indices=False and hasattr(X, > > "tocsr") > I think we already for 1) in the past. Let me implement both 1) and > 2). +1 for 1). This p

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Fabian Pedregosa
On Sun, Nov 6, 2011 at 1:26 AM, Lars Buitinck wrote: > 2011/11/6 Sami Liedes : >> On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote: >>> With sparse data, you should use the indices=True argument to >>> StratifiedKFold. By default, it will return a boolean mask, which >>> cannot be use

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-06 Thread Olivier Grisel
2011/11/6 Mathieu Blondel : >> With sparse data, you should use the indices=True argument to >> StratifiedKFold. By default, it will return a boolean mask, which >> cannot be used to index into a sparse matrix. > > We really need to do something about this issue, as it keeps popping > up. A few ide

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-05 Thread Mathieu Blondel
> With sparse data, you should use the indices=True argument to > StratifiedKFold. By default, it will return a boolean mask, which > cannot be used to index into a sparse matrix. We really need to do something about this issue, as it keeps popping up. A few ideas: 1) make indices=True the defaul

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-05 Thread Lars Buitinck
2011/11/6 Sami Liedes : > On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote: >> With sparse data, you should use the indices=True argument to >> StratifiedKFold. By default, it will return a boolean mask, which >> cannot be used to index into a sparse matrix. > > Ah, didn't know of that

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-05 Thread Sami Liedes
On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote: > 2011/11/5 Sami Liedes : > >    train,test = iter(StratifiedKFold(DATA.classes, 2)).next() > > With sparse data, you should use the indices=True argument to > StratifiedKFold. By default, it will return a boolean mask, which > cannot

Re: [Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-05 Thread Lars Buitinck
2011/11/5 Sami Liedes : >    train,test = iter(StratifiedKFold(DATA.classes, 2)).next() With sparse data, you should use the indices=True argument to StratifiedKFold. By default, it will return a boolean mask, which cannot be used to index into a sparse matrix. >    # DATA.features is a sparse ma

[Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

2011-11-05 Thread Sami Liedes
Hi! This looks like a bug to me, but since I'm new to sklearn, I thought I'd ask first if I'm doing something wrong before reporting a bug. It seems that sparse.SVC and GridSearchCV don't play along nicely if I pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get ValueError("cannot