On Sun, Nov 06, 2011 at 12:22:37AM +0100, Lars Buitinck wrote:
> 2011/11/5 Sami Liedes <[email protected]>:
> > train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
>
> With sparse data, you should use the indices=True argument to
> StratifiedKFold. By default, it will return a boolean mask, which
> cannot be used to index into a sparse matrix.
Ah, didn't know of that option. Thanks. (Why does indexing into a
sparse matrix with a boolean mask not fail with an error message but
rather silently returns the entire matrix?)
> > # DATA.features is a sparse matrix in csr format.
> > # DATA.features[train] just returns the entire array for some reason...
> > tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train]
>
> DATA.features.toarray() actually densifies your sample vectors...
Yes, that was the idea, since indexing it with a boolean mask did not
work. Obviously indices=True is a better solution ;)
Thank you for these suggestions! It's always enlightening to learn.
Unfortunately this still doesn't resolve the problem with sparse.SVC
and GridSearchCV(njobs=10); I still get the same error message:
------------------------------------------------------------
$ python proj.py
Traceback (most recent call last):
File "proj.py", line 102, in <module>
main()
File "proj.py", line 80, in main
train_svc()
File "proj.py", line 65, in train_svc
grid_search(svm.sparse.SVC(C=1), TUNED_PARAMS)
File "proj.py", line 46, in grid_search
clf.fit(tf, tc, cv=StratifiedKFold(tc, 5, indices=True))
File "/usr/lib/pymodules/python2.7/sklearn/grid_search.py", line 322, in fit
best_estimator.fit(X, y, **self.fit_params)
File "/usr/lib/pymodules/python2.7/sklearn/svm/sparse/base.py", line 112, in
fit
int(self.shrinking), int(self.probability))
File "libsvm.pyx", line 157, in sklearn.svm.sparse.libsvm.libsvm_sparse_train
(sklearn/svm/sparse/libsvm.c:1924)
ValueError: cannot resize this array: it does not own its data
------------------------------------------------------------
Sami
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general