On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote:
> 2011/11/5 Sami Liedes <[email protected]>:
> >    train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
> 
> With sparse data, you should use the indices=True argument to
> StratifiedKFold. By default, it will return a boolean mask, which
> cannot be used to index into a sparse matrix.

Ah, didn't know of that option. Thanks. (Why does indexing into a
sparse matrix with a boolean mask not fail with an error message but
rather silently returns the entire matrix?)

> >    # DATA.features is a sparse matrix in csr format.
> >    # DATA.features[train] just returns the entire array for some reason...
> >    tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train]
> 
> DATA.features.toarray() actually densifies your sample vectors...

Yes, that was the idea, since indexing it with a boolean mask did not
work. Obviously indices=True is a better solution ;)

Thank you for these suggestions! It's always enlightening to learn.

Unfortunately this still doesn't resolve the problem with sparse.SVC
and GridSearchCV(njobs=10); I still get the same error message:

------------------------------------------------------------
$ python proj.py
Traceback (most recent call last):
  File "proj.py", line 102, in <module>
    main()
  File "proj.py", line 80, in main
    train_svc()
  File "proj.py", line 65, in train_svc
    grid_search(svm.sparse.SVC(C=1), TUNED_PARAMS)
  File "proj.py", line 46, in grid_search
    clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True))
  File "/usr/lib/pymodules/python2.7/sklearn/grid_search.py", line 322, in fit
    best_estimator.fit(X, y, **self.fit_params)
  File "/usr/lib/pymodules/python2.7/sklearn/svm/sparse/base.py", line 112, in 
fit
    int(self.shrinking), int(self.probability))
  File "libsvm.pyx", line 157, in sklearn.svm.sparse.libsvm.libsvm_sparse_train 
(sklearn/svm/sparse/libsvm.c:1924)
ValueError: cannot resize this array: it does not own its data
------------------------------------------------------------

Sami

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to