2011/11/5 Sami Liedes <[email protected]>:
>    train,test = iter(StratifiedKFold(DATA.classes, 2)).next()

With sparse data, you should use the indices=True argument to
StratifiedKFold. By default, it will return a boolean mask, which
cannot be used to index into a sparse matrix.

>    # DATA.features is a sparse matrix in csr format.
>    # DATA.features[train] just returns the entire array for some reason...
>    tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train]

DATA.features.toarray() actually densifies your sample vectors...

>    for score_name, score_func in SCORES:
>        clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS,
>                           score_func=score_func, n_jobs=10)
>        clf.fit(tf, tc, cv=StratifiedKFold(tc, 5))

Again, indices=True

>        c_true = DATA.classes[test]
>        c_pred = clf.predict(DATA.features.toarray()[test])

And again, toarray() is a waste (and you're doing it in a loop!)


HTH and good luck,

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to