2011/11/5 Sami Liedes <[email protected]>: > train,test = iter(StratifiedKFold(DATA.classes, 2)).next()
With sparse data, you should use the indices=True argument to StratifiedKFold. By default, it will return a boolean mask, which cannot be used to index into a sparse matrix. > # DATA.features is a sparse matrix in csr format. > # DATA.features[train] just returns the entire array for some reason... > tf, tc = csr_matrix(DATA.features.toarray()[train]), DATA.classes[train] DATA.features.toarray() actually densifies your sample vectors... > for score_name, score_func in SCORES: > clf = GridSearchCV(svm.sparse.SVC(C=1), TUNED_PARAMS, > score_func=score_func, n_jobs=10) > clf.fit(tf, tc, cv=StratifiedKFold(tc, 5)) Again, indices=True > c_true = DATA.classes[test] > c_pred = clf.predict(DATA.features.toarray()[test]) And again, toarray() is a waste (and you're doing it in a loop!) HTH and good luck, -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
