Should be fixed in current master:
https://github.com/scikit-learn/scikit-learn/commit/09677e0193a9dc693f85684bb1c2da56fe6804a2
Although I haven't attempted to reproduce the issue, so please report
back if it still not working.
Best,
Fabian.
On Sun, Nov 6, 2011 at 5:56 PM, Olivier Grisel w
2011/11/6 Lars Buitinck :
>
> Shall we apply this workaround in the master as well? That way, grid
> search will also work with user-defined estimators that don't uphold
> the contract; better safe than sorry. I've already refactored Sami's
> script into a testcase with a custom BrokenEstimator.
A
2011/11/6 Olivier Grisel :
> 2011/11/6 Lars Buitinck :
>> 2011/11/6 Vlad Niculae :
>>> This is exactly what I would expect as well. I think this is the
>>> biggest gain of having objects instead of functions: being able to
>>> store init parameters in an object that you will then fit and evaluate
>
On Sun, Nov 6, 2011 at 11:59 PM, Olivier Grisel
wrote:
> The fit contract is that it can be called several times and that the
I added this "fit contract" to issue #406
https://github.com/scikit-learn/scikit-learn/issues/406
Mathieu
-
Yes, I was thinking of a sequencial, exploratory IPython-style thing
where you change something in your X and re-fit, when you don't want
to clone and delete the old estimator. Hope this makes sense.
Vlad
> 2011/11/6 Lars Buitinck :
>> 2011/11/6 Vlad Niculae :
>>> This is exactly what I would exp
2011/11/6 Lars Buitinck :
> 2011/11/6 Vlad Niculae :
>> This is exactly what I would expect as well. I think this is the
>> biggest gain of having objects instead of functions: being able to
>> store init parameters in an object that you will then fit and evaluate
>> on different sets.
>
> Isn't th
2011/11/6 Vlad Niculae :
> This is exactly what I would expect as well. I think this is the
> biggest gain of having objects instead of functions: being able to
> store init parameters in an object that you will then fit and evaluate
> on different sets.
Isn't that what base.clone is for?
--
Lar
> I would personally prefer if all estimators in scikit-learn could be
> refitted. This is what I would expect from a batch fit method.
> Mathieu
> The fit contract is that it can be called several times and that the
> results of previous calls are forgotten. Warm restart should be
> handled expli
2011/11/6 Lars Buitinck :
> 2011/11/6 Sami Liedes :
>> clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS, n_jobs=10)
>> clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True))
>
> Reproduced with a much smaller set of TUNED_PARAMS. With n_jobs=1, the
> problem does not occur.
>
> One solution i
On Sun, Nov 6, 2011 at 11:44 PM, Lars Buitinck wrote:
> One solution is to clone(best_estimator) at grid_search.py, before
> line 341 (in the if self.refit block). Currently, the code assumes
> that a fit estimator can simply be re-fit on a new dataset, which is
> not true of sparse SVMs due to t
2011/11/6 Sami Liedes :
> clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS, n_jobs=10)
> clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True))
Reproduced with a much smaller set of TUNED_PARAMS. With n_jobs=1, the
problem does not occur.
One solution is to clone(best_estimator) at grid_sea
On Sun, Nov 06, 2011 at 09:41:31AM +0100, Gael Varoquaux wrote:
> Can you send us a small self-contained script that reproduces your
> problem, and we'll try to fix it ASAP.
Sure. This script reproduces the problem for me:
from sklearn
2011/11/6 Gael Varoquaux :
> On Sun, Nov 06, 2011 at 09:23:14AM +0100, Olivier Grisel wrote:
>> > 1) make indices=True the default in cv objects
>
>> > 2) in check_cv, raise an exception if cv.indices=False and hasattr(X,
>> > "tocsr")
>
>> I think we already for 1) in the past. Let me implement b
On Sun, Nov 06, 2011 at 12:54:36AM +0200, Sami Liedes wrote:
> It seems that sparse.SVC and GridSearchCV don't play along nicely if I
> pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get
> ValueError("cannot resize this array: it does not own its data") from
> inside libsvm.pyx:
Hi
On Sun, Nov 06, 2011 at 09:23:14AM +0100, Olivier Grisel wrote:
> > 1) make indices=True the default in cv objects
> > 2) in check_cv, raise an exception if cv.indices=False and hasattr(X,
> > "tocsr")
> I think we already for 1) in the past. Let me implement both 1) and
> 2).
+1 for 1). This p
On Sun, Nov 6, 2011 at 1:26 AM, Lars Buitinck wrote:
> 2011/11/6 Sami Liedes :
>> On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote:
>>> With sparse data, you should use the indices=True argument to
>>> StratifiedKFold. By default, it will return a boolean mask, which
>>> cannot be use
2011/11/6 Mathieu Blondel :
>> With sparse data, you should use the indices=True argument to
>> StratifiedKFold. By default, it will return a boolean mask, which
>> cannot be used to index into a sparse matrix.
>
> We really need to do something about this issue, as it keeps popping
> up. A few ide
> With sparse data, you should use the indices=True argument to
> StratifiedKFold. By default, it will return a boolean mask, which
> cannot be used to index into a sparse matrix.
We really need to do something about this issue, as it keeps popping
up. A few ideas:
1) make indices=True the defaul
2011/11/6 Sami Liedes :
> On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote:
>> With sparse data, you should use the indices=True argument to
>> StratifiedKFold. By default, it will return a boolean mask, which
>> cannot be used to index into a sparse matrix.
>
> Ah, didn't know of that
On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote:
> 2011/11/5 Sami Liedes :
> > train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
>
> With sparse data, you should use the indices=True argument to
> StratifiedKFold. By default, it will return a boolean mask, which
> cannot
2011/11/5 Sami Liedes :
> train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
With sparse data, you should use the indices=True argument to
StratifiedKFold. By default, it will return a boolean mask, which
cannot be used to index into a sparse matrix.
> # DATA.features is a sparse ma
Hi!
This looks like a bug to me, but since I'm new to sklearn, I thought
I'd ask first if I'm doing something wrong before reporting a bug.
It seems that sparse.SVC and GridSearchCV don't play along nicely if I
pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get
ValueError("cannot
22 matches
Mail list logo