Hi!
This looks like a bug to me, but since I'm new to sklearn, I thought
I'd ask first if I'm doing something wrong before reporting a bug.
It seems that sparse.SVC and GridSearchCV don't play along nicely if I
pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get
ValueError("cannot resize this array: it does not own its data") from
inside libsvm.pyx:
------------------------------------------------------------
$ python proj.py
Traceback (most recent call last):
File "proj.py", line 79, in <module>
main()
File "proj.py", line 62, in main
train_svc()
File "proj.py", line 49, in train_svc
clf.fit(tf, tc, cv=StratifiedKFold(tc, 5))
File "/usr/lib/pymodules/python2.7/sklearn/grid_search.py", line 322, in fit
best_estimator.fit(X, y, **self.fit_params)
File "/usr/lib/pymodules/python2.7/sklearn/svm/sparse/base.py", line 112, in
fit
int(self.shrinking), int(self.probability))
File "libsvm.pyx", line 157, in sklearn.svm.sparse.libsvm.libsvm_sparse_train
(sklearn/svm/sparse/libsvm.c:1924)
ValueError: cannot resize this array: it does not own its data
------------------------------------------------------------
My code is modeled after the code in
http://scikit-learn.sourceforge.net/stable/auto_examples/grid_search_digits.html
. Here's the function:
------------------------------------------------------------
def train_svc():
TUNED_PARAMS = [
{'kernel': ['rbf'], 'gamma': 10.0**scipy.arange(-2,-5,-.5),
'C': 10.0**scipy.arange(0,4,.5)},
{'kernel': ['linear'], 'C': 10.0**scipy.arange(0,4,.5)}]
SCORES = [('precision', precision_score),
('recall', recall_score)]
train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
# DATA.features is a sparse matrix in csr format.
# DATA.features[train] just returns the entire array for some reason...
tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train]
for score_name, score_func in SCORES:
clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS,
score_func=score_func, n_jobs=10)
clf.fit(tf, tc, cv=StratifiedKFold(tc, 5))
c_true = DATA.classes[test]
c_pred = clf.predict(DATA.features.toarray()[test])
print "Best estimator: "+str(clf.best_estimator)
print "Tuned for '%s' with optimal value %f" % (
score_name, score_func(c_true, c_pred))
print classification_report(c_true, c_pred)
print "Grid scores:"
pprint(clf.grid_scores_)
------------------------------------------------------------
If I change the code to use svm.SVC instead of svm.sparse.SVC, I don't
get the error, but since my data is sparse enough, I'm actually better
off using svm.sparse.SVC with n_jobs=1...
Sami
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general