[Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

Sami Liedes Sat, 05 Nov 2011 15:55:06 -0700

Hi!

This looks like a bug to me, but since I'm new to sklearn, I thought
I'd ask first if I'm doing something wrong before reporting a bug.


It seems that sparse.SVC and GridSearchCV don't play along nicely if I
pass a parameter n_jobs > 1 to GridSearchCV(). At some point I get
ValueError("cannot resize this array: it does not own its data") from
inside libsvm.pyx:

------------------------------------------------------------
$ python proj.py
Traceback (most recent call last):
  File "proj.py", line 79, in <module>
    main()
  File "proj.py", line 62, in main
    train_svc()
  File "proj.py", line 49, in train_svc
    clf.fit(tf, tc, cv=StratifiedKFold(tc, 5))
  File "/usr/lib/pymodules/python2.7/sklearn/grid_search.py", line 322, in fit
    best_estimator.fit(X, y, **self.fit_params)
  File "/usr/lib/pymodules/python2.7/sklearn/svm/sparse/base.py", line 112, in 
fit
    int(self.shrinking), int(self.probability))
  File "libsvm.pyx", line 157, in sklearn.svm.sparse.libsvm.libsvm_sparse_train 
(sklearn/svm/sparse/libsvm.c:1924)
ValueError: cannot resize this array: it does not own its data
------------------------------------------------------------

My code is modeled after the code in
http://scikit-learn.sourceforge.net/stable/auto_examples/grid_search_digits.html
. Here's the function:

------------------------------------------------------------
def train_svc():
    TUNED_PARAMS = [
        {'kernel': ['rbf'], 'gamma': 10.0**scipy.arange(-2,-5,-.5),
         'C': 10.0**scipy.arange(0,4,.5)},
        {'kernel': ['linear'], 'C': 10.0**scipy.arange(0,4,.5)}]

    SCORES = [('precision', precision_score),
              ('recall', recall_score)]

    train,test = iter(StratifiedKFold(DATA.classes, 2)).next()

    # DATA.features is a sparse matrix in csr format.
    # DATA.features[train] just returns the entire array for some reason...
    tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train]
    for score_name, score_func in SCORES:
        clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS,
                           score_func=score_func, n_jobs=10)
        clf.fit(tf, tc, cv=StratifiedKFold(tc, 5))
        c_true = DATA.classes[test]
        c_pred = clf.predict(DATA.features.toarray()[test])
        print "Best estimator: "+str(clf.best_estimator)
        print "Tuned for '%s' with optimal value %f" % (
            score_name, score_func(c_true, c_pred))
        print classification_report(c_true, c_pred)
        print "Grid scores:"
        pprint(clf.grid_scores_)
------------------------------------------------------------

If I change the code to use svm.SVC instead of svm.sparse.SVC, I don't
get the error, but since my data is sparse enough, I'm actually better
off using svm.sparse.SVC with n_jobs=1...

        Sami

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Parallel GridSearchCV on sparse.SVC fails with ValueError

Reply via email to